Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published 18 days ago • 208
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation Paper • 2511.20714 • Published Nov 25, 2025 • 48
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following Paper • 2511.21662 • Published Nov 26, 2025 • 11
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 45
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 90
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment Paper • 2511.20614 • Published Nov 25, 2025 • 38
Table-R1: Inference-Time Scaling for Table Reasoning Paper • 2505.23621 • Published May 29, 2025 • 93
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13, 2025 • 191
Enabling Scalable Oversight via Self-Evolving Critic Paper • 2501.05727 • Published Jan 10, 2025 • 72