Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published about 1 month ago • 50
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8, 2025 • 28
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision Paper • 2507.20976 • Published Jul 28, 2025 • 10
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29, 2025 • 8