Collections
Discover the best community collections!
Collections including paper arxiv:2601.22153
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 20 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 44 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 32
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.25M • • 4.46k -
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper • 2512.20605 • Published • 62 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 19 -
TimeBill: Time-Budgeted Inference for Large Language Models
Paper • 2512.21859 • Published • 25
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 30 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 16 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 20 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.25M • • 4.46k -
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper • 2512.20605 • Published • 62 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 19 -
TimeBill: Time-Budgeted Inference for Large Language Models
Paper • 2512.21859 • Published • 25
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 44 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 32
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 30 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 16 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16