Scaling Open-Ended Reasoning to Predict the Future Paper • 2512.25070 • Published about 9 hours ago • 1
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision Paper • 2509.14234 • Published Sep 17, 2025 • 5
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11, 2025 • 34
answer-matching Collection Free-form datasets, human annotations, and sample-level model outputs for "Answer Matching Outperforms Multiple Choice for Language Model Evaluation" • 2 items • Updated Jul 3, 2025 • 2
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7, 2025 • 151
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6, 2025 • 33