The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 45
TheMCPCompany: Creating General-purpose Agents with Task-specific Tools Paper • 2510.19286 • Published Oct 22, 2025 • 8
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Paper • 2507.03336 • Published Jul 4, 2025 • 6