Instructions to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RicardoLee/Llama2-base-7B-Chinese-50W-LoRA")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RicardoLee/Llama2-base-7B-Chinese-50W-LoRA")
model = AutoModelForCausalLM.from_pretrained("RicardoLee/Llama2-base-7B-Chinese-50W-LoRA")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/RicardoLee/Llama2-base-7B-Chinese-50W-LoRA

SGLang

How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RicardoLee/Llama2-base-7B-Chinese-50W-LoRA",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use RicardoLee/Llama2-base-7B-Chinese-50W-LoRA with Docker Model Runner:
```
docker model run hf.co/RicardoLee/Llama2-base-7B-Chinese-50W-LoRA
```

7B Chinese Chatbot trained based on LLama2-base 7B (Pure LoRA Training)

Introduction

在完成了Llama2-chat 7B Chinese 和 Llama2-chat 13B Chinese 的训练后，我非常好奇能否直接基于Llama2-base 系列直接进行SFT训练。这也是本模型仓库的初衷。

终于，在RicardoLee/Llama2-base-7B-Chinese-50W-pre_release，RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA 之后，我成功探索出了能稳定训练LoRA的参数，并最终完成了50W 数据的LoRA 训练。

训练数据使用BELLE项目中采样的50万SFT数据进行SFT训练。

After finishing the training of Llama2-chat 7B Chinese and Llama2-chat 13B Chinese, I am deeply intrigued by the possibility of conducting SFT (Style-Fine-Tuning) training directly based on the Llama2-base series. This is the fundamental purpose of this model repository.

Finally, after RicardoLee/Llama2-base-7B-Chinese-50W-pre_release，RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA, I did find the right hyperparams to do the LoRA training stabelly based on Llama2-base 7B model. For more details please refer to the Train Detail section.

The training data is sampled from BELLE project, which consists of 500,000 SFT samples.

Train Detail

一些训练上的细节：

训练框架：该模型使用了修改过的Chinese-LLaMA-Alpaca项目进行训练。
Tokenizer：该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
训练参数：该模型训练使用的超参数为：LoRA rank: 64, LR: 4e-4, Warmup ratio: 0.001.
训练资源：8卡V100。21小时
训练起始的loss：9.1402
训练终止的loss：1.4104

Some details in training:

Trianing Framework: This model is trained on modified Chinese-LLaMA-Alpaca Framework.
Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.
Training Parameters: The hyperparams are: LoRA rank: 64, LR: 4e-4, Warmup ratio: 0.001.
Training Resource: 8*V100, 21 hours.
Initial Loss: 9.1402
Train Loss: 1.4104

Inference

该模型依然采用stanford alpaca 模版。因此在测试时且别忘记添加开场白。开场白如下：

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\n${Your Content}\n\n### Response:\n\n"

对于带上文的对话，开场白如下：

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\nHuman:${Previous Human Content}\nAssistant:${Previous Assistance Content}\nHuman:${Your Question}\n\n### Response:\n\n"

This model still using the Stanford Alpaca template. Therefore, don't forget to add prologue template. The prologue template is:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\n${Your Content}\n\n### Response:\n\n"

For dialogue with context, the prelogue template is:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\nHuman:${Previous Human Content}\nAssistant:${Previous Machine Content}\nHuman:${Your Question}\n\n### Response:\n\n"

Licence

本仓库的模型依照 Apache-2.0 协议开源，模型的权重的使用则需要遵循LLama2MODEL LICENCE。

This repository's models are open-sourced under the Apache-2.0 license, and their weight usage must adhere to LLama2 MODEL LICENCE license.

Future Work

将会在近期逐步放出

更大SFT数据规模训练下的模型。
13B及以下的LLama2 同LLama2-chat的模型，以供大家对比。

I will release the following models:

Models trained on larger data scale.
Models trained on LLama2 and LLama2-chat (under the 13B, since I only have V100), for comparison.

Downloads last month: 17