Agent tuning zai-org/SWE-Dev-train Viewer • Updated Jul 9, 2025 • 20.1k • 313 • 18 SWE-Gym/OpenHands-SFT-Trajectories Viewer • Updated May 10, 2025 • 491 • 107 • 14 lmarena-ai/webdev-arena-preference-10k Viewer • Updated Mar 10, 2025 • 10.5k • 139 • 15 SWE-bench/SWE-smith-trajectories Viewer • Updated Jul 19, 2025 • 76k • 1.52k • 41
Agent Benchmarks xw27/scibench Viewer • Updated May 6, 2024 • 692 • 635 • 22 google/frames-benchmark Viewer • Updated Oct 15, 2024 • 824 • 8.17k • 239 gaia-benchmark/GAIA Viewer • Updated Oct 28, 2025 • 932 • 15.9k • 588 HuggingFaceH4/MATH-500 Viewer • Updated Dec 15, 2025 • 500 • 97.4k • 281
Agent tuning zai-org/SWE-Dev-train Viewer • Updated Jul 9, 2025 • 20.1k • 313 • 18 SWE-Gym/OpenHands-SFT-Trajectories Viewer • Updated May 10, 2025 • 491 • 107 • 14 lmarena-ai/webdev-arena-preference-10k Viewer • Updated Mar 10, 2025 • 10.5k • 139 • 15 SWE-bench/SWE-smith-trajectories Viewer • Updated Jul 19, 2025 • 76k • 1.52k • 41
Agent Benchmarks xw27/scibench Viewer • Updated May 6, 2024 • 692 • 635 • 22 google/frames-benchmark Viewer • Updated Oct 15, 2024 • 824 • 8.17k • 239 gaia-benchmark/GAIA Viewer • Updated Oct 28, 2025 • 932 • 15.9k • 588 HuggingFaceH4/MATH-500 Viewer • Updated Dec 15, 2025 • 500 • 97.4k • 281
Sleeping Experimental Evaluation 📊 Plan and generate experimental validation methods for AI projects
akseljoonas/qwen3-4b-dpo-hh-rlhf-reversed Text Generation • 4B • Updated about 8 hours ago • 30