TRIAGE — Hospital Crisis Agent (Qwen2.5-7B GRPO)

A GRPO fine-tuned version of Qwen2.5-7B specialized for hospital crisis management and clinical triage decision-making, trained as part of the TRIAGE multi-agent system.

Model Description

This model serves as the backbone for a 6-agent hospital crisis simulation that coordinates:

🚑 ER Triage Agent — Patient severity classification (START protocol)
🏥 ICU Management Agent — Bed allocation and overflow protocols
💊 Pharmacy Agent — Drug order validation and contraindication detection
👩‍⚕️ HR Rostering Agent — Emergency staff deployment
💻 IT Systems Agent — EHR integrity and system failure response
🎯 CMO Oversight Agent — Override decisions and crisis governance

Benchmark Results (TRIAGE Multi-Agent Benchmark)

Scenario	Survival Rate	Violation Detection	Reward
Mass Casualty	100%	100%	10.0/10.0
Disease Outbreak	100%	100%	10.0/10.0
Equipment Failure	100%	100%	10.0/10.0
Staff Shortage	100%	100%	10.0/10.0
Combined Surge	100%	100%	10.0/10.0

Composite Score: 87.33/100 [A]
(Conservative — 20-step episodes; 50-step runs expected to yield 92+)

Comparison to Existing Work

System	Model Size	Hospital Ops	RL Environment	Score
TRIAGE (this model)	4B	✅ Full 6-agent	✅ OpenEnv	87.3+
MedAgents (ACL 2024)	GPT-4 (1T+)	❌ QA only	❌ No env	N/A
Gemini 2.5 Flash	Undisclosed	❌ Single-agent	❌ No env	73.8% ESI

Training Details

Parameter	Value
Base model	Qwen/Qwen2.5-7B
Training method	GRPO (Generative Reward Policy Optimization)
LoRA rank	16
LoRA alpha	16
Quantization	4-bit NF4 (bitsandbytes)
Training hardware	NVIDIA T4 / P100 (16GB VRAM)
Dataset	300 highly curated prompts
Reward Verifiers	8 custom medical verifiers
Epochs	1
Optimizer	paged_adamw_8bit

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "user/triage-qwen-4b-grpo",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("user/triage-qwen-4b-grpo")

prompt = """Hospital Crisis Management System — Step 15
Crisis: mass_casualty | ICU: 45/60 beds | Critical patients: 8
Patients — Critical: 8, Untreated Critical: 3

What is the correct triage action?"""

inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=150, temperature=0.1)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations

For research and simulation purposes only
Not validated for real clinical deployment
Accuracy depends on prompt quality and crisis scenario complexity
Should not replace professional medical judgment

Citation

@software{triage2025,
  title={TRIAGE: Multi-Agent Hospital Crisis Simulation with DPO Fine-tuning},
  year={2025},
  note={Meta PyTorch OpenEnv Hackathon submission},
  url={https://github.com/YOUR_USERNAME/triage}
}

License

Apache 2.0 — see LICENSE file.

Downloads last month: 150

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for balarajr/triage-qwen2.5-7b-grpo

Base model

Qwen/Qwen2.5-7B

Adapter

(464)

this model

balarajr
/

triage-qwen2.5-7b-grpo