Trainee2Trainer
Collection
This is the checkpoints and dataset for: From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning • 3 items • Updated
Large Language Models
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It