Nexus GRPO v3 — env v2 (hidden objectives + composable rubrics)

Qwen 2.5 7B (LoRA) trained on the GeoPolicy env v2 using GRPO with stratified hidden objectives and frozen-LoRA opponents.

Files

Path	Contents
`checkpoints/best/`	Highest-`avg(last 10)` checkpoint — recommended adapter
`checkpoints/checkpoint-{10,20,30,40,50}/`	Intermediate checkpoints (every 10 steps)
`checkpoints/final/`	Final-step adapter
`train_log.jsonl`	Per-step metrics: reward, KL, rubric components, action mix
`eval_3variants.json`	3-variant baseline comparison (Base / +SFT / +SFT+GRPO)
`best_checkpoint.json`	Auto-selected best checkpoint metadata
`rollouts/step_*.jsonl`	Full per-rollout transcripts (every 5 steps)
`plots/`	Training + eval figures (PNG)
`run_config.json`	Hyperparameters and run config

Training summary

Algorithm: GRPO, group_size=4, 50 steps
Hyperparams: LR=1e-5, β=0.01, grad_clip=10, temp=1.0
Reward: 0.5×mean(per_turn) + 0.5×final_grade, both via TaskRubric blend
SFT init: local SFT on 200 balanced demonstrations (3 epochs)
Opponents: frozen-LoRA (Option B) — other 4 countries play with frozen SFT
Stratification: Nexus rotated through all 8 hidden objectives (~6 each)

See run_config.json for exact values.

Use the best checkpoint

from peft import PeftModel
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("unsloth/Qwen2.5-7B-Instruct-bnb-4bit", ...)
model.load_adapter("adityadas14/nexus-grpo-v3", subfolder="checkpoints/best", adapter_name="default")
model.set_adapter("default")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adityadas14/nexus-grpo-v3

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Quantized

unsloth/Qwen2.5-7B-Instruct-bnb-4bit

Finetuned

(124)

this model