Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Paper • 2603.25750 • Published 24 days ago • 36
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning Paper • 2603.22057 • Published 21 days ago • 45 • 4
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 27 days ago • 153
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 99
Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation Paper • 2512.17040 • Published Dec 18, 2025 • 29
Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure Paper • 2512.14336 • Published Dec 16, 2025 • 32
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 123
Running on Zero MCP Featured 2.17k Qwen Image Edit Camera Control 🎬 2.17k Fast 4 step inference with Qwen Image Edit 2509
PHUMA: Physically-Grounded Humanoid Locomotion Dataset Paper • 2510.26236 • Published Oct 30, 2025 • 30
ACG: Action Coherence Guidance for Flow-based VLA models Paper • 2510.22201 • Published Oct 25, 2025 • 37
EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization Paper • 2303.01904 • Published Mar 3, 2023
DesignLab: Designing Slides Through Iterative Detection and Correction Paper • 2507.17202 • Published Jul 23, 2025 • 51
Token Bottleneck: One Token to Remember Dynamics Paper • 2507.06543 • Published Jul 9, 2025 • 20
ProLIP Collection Official ProLIP weights, Probabilistic Language-Image Pre-Training (ICLR 2025) • 7 items • Updated Apr 18, 2025 • 10