RLPR
Collection
Extrapolating RLVR to General Domains without Verifiers โข 6 items โข Updated โข 6
How to use openbmb/RLPR-Qwen2.5-7B-Base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="openbmb/RLPR-Qwen2.5-7B-Base")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("openbmb/RLPR-Qwen2.5-7B-Base")
model = AutoModelForCausalLM.from_pretrained("openbmb/RLPR-Qwen2.5-7B-Base")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use openbmb/RLPR-Qwen2.5-7B-Base with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/RLPR-Qwen2.5-7B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Qwen2.5-7B-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/openbmb/RLPR-Qwen2.5-7B-Base
How to use openbmb/RLPR-Qwen2.5-7B-Base with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "openbmb/RLPR-Qwen2.5-7B-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Qwen2.5-7B-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "openbmb/RLPR-Qwen2.5-7B-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Qwen2.5-7B-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use openbmb/RLPR-Qwen2.5-7B-Base with Docker Model Runner:
docker model run hf.co/openbmb/RLPR-Qwen2.5-7B-Base
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("openbmb/RLPR-Qwen2.5-7B-Base")
model = AutoModelForCausalLM.from_pretrained("openbmb/RLPR-Qwen2.5-7B-Base")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))RLPR-Qwen2.5-7B-Base is trained from Qwen2.5-7B-Base with the RLPR framework, which eliminates reliance on external verifiers and is simple and generalizable for more domains.
Usage adopted from Qwen2.5-7B-Instruct
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "openbmb/RLPR-Qwen2.5-7B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "How much energy is produced when the sun converts one kg of hydrogen into helium?."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
If you find our model/code/paper helpful, please consider citing our papers ๐:
@misc{yu2025rlprextrapolatingrlvrgeneral,
title={RLPR: Extrapolating RLVR to General Domains without Verifiers},
author={Tianyu Yu and Bo Ji and Shouli Wang and Shu Yao and Zefan Wang and Ganqu Cui and Lifan Yuan and Ning Ding and Yuan Yao and Zhiyuan Liu and Maosong Sun and Tat-Seng Chua},
year={2025},
eprint={2506.18254},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.18254},
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/RLPR-Qwen2.5-7B-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)