view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 4 days ago • 39
view reply Just a heads-up in case any of the maintainers see this, the embedded Gradio space at the top of the article seems to be broken.
Instruction Pretrained Experiments Collection Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study' • 3 items • Updated Dec 11, 2025 • 1
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 8 days ago • 82
MMFineReason Collection High-quality STEM reasoning dataset for Multimodal LLM post-training. • 14 items • Updated 4 days ago • 20
Continually pre-trained models Collection Language-specific LLMs continually pre-trained from fully open English base models • 2 items • Updated 17 days ago • 1
MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 16 items • Updated 9 days ago • 13