JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction Paper • 2512.14620 • Published Dec 16, 2025 • 2
JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction Paper • 2512.14620 • Published Dec 16, 2025 • 2
JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction Paper • 2512.14620 • Published Dec 16, 2025 • 2
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper Paper • 2511.04583 • Published Nov 6, 2025 • 2
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper Paper • 2511.04583 • Published Nov 6, 2025 • 2
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper Paper • 2511.04583 • Published Nov 6, 2025 • 2 • 2
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks Paper • 2506.01952 • Published Jun 2, 2025 • 10 • 3