FoundationalASSIST: An Educational Dataset for Foundational Knowledge Tracing and Pedagogical Grounding of LLMs Paper • 2602.00070 • Published Jan 20 • 2
Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning Paper • 2603.19607 • Published 24 days ago • 3
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions Paper • 2502.13124 • Published Feb 18, 2025 • 8
Optimization-Guided Diffusion for Interactive Scene Generation Paper • 2512.07661 • Published Dec 8, 2025 • 5
ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents Paper • 2604.02834 • Published 10 days ago • 2
Infinity Instruct Collection Scaling Instruction Selection and Synthesis to Enhance Language Models • 17 items • Updated Feb 4 • 11
OctoPack: Instruction Tuning Code Large Language Models Paper • 2308.07124 • Published Aug 14, 2023 • 33
ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget Paper • 2604.01195 • Published 12 days ago • 3
MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios Paper • 2603.28130 • Published 14 days ago • 11
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge Paper • 1803.05457 • Published Mar 14, 2018 • 4
FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching Paper • 2604.06757 • Published 5 days ago • 10
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization Paper • 2604.07343 • Published 5 days ago • 11
Unsupervised Welding Defect Detection Using Audio And Video Paper • 2409.02290 • Published Sep 3, 2024 • 3
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Paper • 2308.11462 • Published Aug 20, 2023 • 6
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents Paper • 2603.24329 • Published 18 days ago • 28
RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis Paper • 2404.16754 • Published Apr 25, 2024 • 2