Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published 26 days ago • 30
VideoWorld 2: Learning Transferable Knowledge from Real-world Videos Paper • 2602.10102 • Published Feb 10 • 14
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published Jan 22 • 14
Inference-time Physics Alignment of Video Generative Models with Latent World Models Paper • 2601.10553 • Published Jan 15 • 12 • 5
ThinkGen: Generalized Thinking for Visual Generation Paper • 2512.23568 • Published Dec 29, 2025 • 1
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published Dec 16, 2025 • 72
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published Dec 23, 2025 • 43
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published Dec 10, 2025 • 74
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling Paper • 2505.23155 • Published May 29, 2025 • 2
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling Paper • 2505.23155 • Published May 29, 2025 • 2
view article Article Cosmos Predict 2.5 & Transfer 2.5: Evolving the World Foundation Models for Physical AI Oct 28, 2025 • 21
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26, 2025 • 15
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper • 2509.09674 • Published Sep 11, 2025 • 80
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11, 2025 • 62
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10, 2025 • 130