Nexus GRPO v3 — env v2 (hidden objectives + composable rubrics)

Qwen 2.5 7B (LoRA) trained on the GeoPolicy env v2 using GRPO with stratified hidden objectives and frozen-LoRA opponents.

Files

Path Contents
checkpoints/best/ Highest-avg(last 10) checkpoint — recommended adapter
checkpoints/checkpoint-{10,20,30,40,50}/ Intermediate checkpoints (every 10 steps)
checkpoints/final/ Final-step adapter
train_log.jsonl Per-step metrics: reward, KL, rubric components, action mix
eval_3variants.json 3-variant baseline comparison (Base / +SFT / +SFT+GRPO)
best_checkpoint.json Auto-selected best checkpoint metadata
rollouts/step_*.jsonl Full per-rollout transcripts (every 5 steps)
plots/ Training + eval figures (PNG)
run_config.json Hyperparameters and run config

Training summary

  • Algorithm: GRPO, group_size=4, 50 steps
  • Hyperparams: LR=1e-5, β=0.01, grad_clip=10, temp=1.0
  • Reward: 0.5×mean(per_turn) + 0.5×final_grade, both via TaskRubric blend
  • SFT init: local SFT on 200 balanced demonstrations (3 epochs)
  • Opponents: frozen-LoRA (Option B) — other 4 countries play with frozen SFT
  • Stratification: Nexus rotated through all 8 hidden objectives (~6 each)

See run_config.json for exact values.

Use the best checkpoint

from peft import PeftModel
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("unsloth/Qwen2.5-7B-Instruct-bnb-4bit", ...)
model.load_adapter("adityadas14/nexus-grpo-v3", subfolder="checkpoints/best", adapter_name="default")
model.set_adapter("default")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adityadas14/nexus-grpo-v3

Base model

Qwen/Qwen2.5-7B
Finetuned
(124)
this model