Instructions to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RicardoLee/Llama2-base-7B-Chinese-50W-LoRA")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("RicardoLee/Llama2-base-7B-Chinese-50W-LoRA") model = AutoModelForCausalLM.from_pretrained("RicardoLee/Llama2-base-7B-Chinese-50W-LoRA") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/RicardoLee/Llama2-base-7B-Chinese-50W-LoRA
- SGLang
How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with Docker Model Runner:
docker model run hf.co/RicardoLee/Llama2-base-7B-Chinese-50W-LoRA
7B Chinese Chatbot trained based on LLama2-base 7B (Pure LoRA Training)
Introduction
在完成了Llama2-chat 7B Chinese 和 Llama2-chat 13B Chinese 的训练后,我非常好奇能否直接基于Llama2-base 系列直接进行SFT训练。这也是本模型仓库的初衷。
终于,在RicardoLee/Llama2-base-7B-Chinese-50W-pre_release,RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA 之后,我成功探索出了能稳定训练LoRA的参数,并最终完成了50W 数据的LoRA 训练。
训练数据使用BELLE项目中采样的50万SFT数据进行SFT训练。
After finishing the training of Llama2-chat 7B Chinese and Llama2-chat 13B Chinese, I am deeply intrigued by the possibility of conducting SFT (Style-Fine-Tuning) training directly based on the Llama2-base series. This is the fundamental purpose of this model repository.
Finally, after RicardoLee/Llama2-base-7B-Chinese-50W-pre_release,RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA, I did find the right hyperparams to do the LoRA training stabelly based on Llama2-base 7B model. For more details please refer to the Train Detail section.
The training data is sampled from BELLE project, which consists of 500,000 SFT samples.
Train Detail
一些训练上的细节:
- 训练框架:该模型使用了修改过的Chinese-LLaMA-Alpaca项目进行训练。
- Tokenizer:该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
- 训练参数:该模型训练使用的超参数为:LoRA rank: 64, LR: 4e-4, Warmup ratio: 0.001.
- 训练资源:8卡V100。21小时
- 训练起始的loss:9.1402
- 训练终止的loss:1.4104
Some details in training:
- Trianing Framework: This model is trained on modified Chinese-LLaMA-Alpaca Framework.
- Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.
- Training Parameters: The hyperparams are: LoRA rank: 64, LR: 4e-4, Warmup ratio: 0.001.
- Training Resource: 8*V100, 21 hours.
- Initial Loss: 9.1402
- Train Loss: 1.4104
Inference
该模型依然采用stanford alpaca 模版。因此在测试时且别忘记添加开场白。开场白如下:
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\n${Your Content}\n\n### Response:\n\n"
对于带上文的对话,开场白如下:
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\nHuman:${Previous Human Content}\nAssistant:${Previous Assistance Content}\nHuman:${Your Question}\n\n### Response:\n\n"
This model still using the Stanford Alpaca template. Therefore, don't forget to add prologue template. The prologue template is:
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\n${Your Content}\n\n### Response:\n\n"
For dialogue with context, the prelogue template is:
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\nHuman:${Previous Human Content}\nAssistant:${Previous Machine Content}\nHuman:${Your Question}\n\n### Response:\n\n"
Licence
本仓库的模型依照 Apache-2.0 协议开源,模型的权重的使用则需要遵循LLama2MODEL LICENCE。
This repository's models are open-sourced under the Apache-2.0 license, and their weight usage must adhere to LLama2 MODEL LICENCE license.
Future Work
将会在近期逐步放出
- 更大SFT数据规模训练下的模型。
- 13B及以下的LLama2 同LLama2-chat的模型,以供大家对比。
I will release the following models:
- Models trained on larger data scale.
- Models trained on LLama2 and LLama2-chat (under the 13B, since I only have V100), for comparison.
- Downloads last month
- 17