Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch Paper • 2602.03183 • Published 6 days ago • 8
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 9 days ago • 87
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Paper • 2601.10387 • Published 24 days ago • 12
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models Paper • 2601.23143 • Published 9 days ago • 38
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors Paper • 2601.07226 • Published 28 days ago • 32
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8, 2025 • 76
How to Train Your LLM Web Agent: A Statistical Diagnosis Paper • 2507.04103 • Published Jul 5, 2025 • 52
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11, 2025 • 101
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think Paper • 2505.10185 • Published May 15, 2025 • 26
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published Apr 24, 2025 • 123
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models? Paper • 2410.07571 • Published Oct 10, 2024 • 2
Reasoning Datasets Collection Reasoning datasets that are trending 🔥 • 10 items • Updated Jan 3, 2025 • 26
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10, 2025 • 75
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 13
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models Jul 27, 2024 • 34