Instructions to use RLinf/WideSeek-R1-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RLinf/WideSeek-R1-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RLinf/WideSeek-R1-4b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("RLinf/WideSeek-R1-4b") model = AutoModelForCausalLM.from_pretrained("RLinf/WideSeek-R1-4b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use RLinf/WideSeek-R1-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RLinf/WideSeek-R1-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RLinf/WideSeek-R1-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RLinf/WideSeek-R1-4b
- SGLang
How to use RLinf/WideSeek-R1-4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RLinf/WideSeek-R1-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RLinf/WideSeek-R1-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RLinf/WideSeek-R1-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RLinf/WideSeek-R1-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RLinf/WideSeek-R1-4b with Docker Model Runner:
docker model run hf.co/RLinf/WideSeek-R1-4b
Add library_name, pipeline_tag, and arxiv metadata
Browse filesThis PR improves the model card for [WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning](https://huggingface.co/papers/2602.04634).
Changes:
- Added `library_name: transformers` to enable proper library detection
- Added `pipeline_tag: text-generation` to enable task filtering and widget support
- Added `arxiv:2602.04634` tag to link the model to the paper page
These metadata improvements will:
1. Make the model discoverable when users filter by text-generation task
2. Link the model to the paper page on Hugging Face
3. Enable the proper inference widget on the model page
4. Help users understand the model's intended use case
Please review and merge if everything looks good.
|
@@ -4,6 +4,10 @@ language:
|
|
| 4 |
- en
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen3-4B
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
metrics:
|
| 8 |
- accuracy
|
| 9 |
model-index:
|
|
@@ -31,9 +35,9 @@ model-index:
|
|
| 31 |
|
| 32 |

|
| 33 |
|
| 34 |
-
Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability.
|
| 35 |
|
| 36 |
-
In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks.
|
| 37 |
|
| 38 |
Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0\% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.
|
| 39 |
|
|
@@ -50,4 +54,4 @@ If you use this model in your research, please cite our paper:
|
|
| 50 |
journal = {arXiv preprint arXiv:2602.04634},
|
| 51 |
year = {2026},
|
| 52 |
}
|
| 53 |
-
```
|
|
|
|
| 4 |
- en
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen3-4B
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
tags:
|
| 10 |
+
- arxiv:2602.04634
|
| 11 |
metrics:
|
| 12 |
- accuracy
|
| 13 |
model-index:
|
|
|
|
| 35 |
|
| 36 |

|
| 37 |
|
| 38 |
+
Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability.
|
| 39 |
|
| 40 |
+
In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks.
|
| 41 |
|
| 42 |
Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0\% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.
|
| 43 |
|
|
|
|
| 54 |
journal = {arXiv preprint arXiv:2602.04634},
|
| 55 |
year = {2026},
|
| 56 |
}
|
| 57 |
+
```
|