Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published 7 days ago • 39
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 11 days ago • 48
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives Paper • 2512.14699 • Published 18 days ago • 27
Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection Paper • 2512.16905 • Published 16 days ago • 30
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published 16 days ago • 37
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives Paper • 2512.14699 • Published 18 days ago • 27
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published 19 days ago • 72
LayerFlow: A Unified Model for Layer-aware Video Generation Paper • 2506.04228 • Published Jun 4, 2025 • 13
DiffDoctor: Diagnosing Image Diffusion Models Before Treating Paper • 2501.12382 • Published Jan 21, 2025
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation Paper • 2512.07831 • Published 26 days ago • 16
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO Paper • 2511.16669 • Published Nov 20, 2025 • 31
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published Oct 9, 2025 • 71
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 96
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 177
Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention Paper • 2510.13940 • Published Oct 15, 2025 • 6