Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published 27 days ago • 50
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8 • 28
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision Paper • 2507.20976 • Published Jul 28 • 10
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29 • 8