LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 6 days ago • 50
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published 7 days ago • 29
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published 11 days ago • 24
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 13 days ago • 66
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24 • 30
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8 • 30
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 26
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22 • 22
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Paper • 2503.16422 • Published Mar 20 • 14