Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math Paper • 2602.06291 • Published 9 days ago • 22
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale Paper • 2601.22146 • Published 16 days ago • 9
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published Nov 27, 2025 • 6
SindBERT, the Sailor: Charting the Seas of Turkish NLP Paper • 2510.21364 • Published Oct 24, 2025 • 1
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published Sep 29, 2025 • 9
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39
Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian Paper • 2509.05668 • Published Sep 6, 2025 • 6
Multilingual State Space Models for Structured Question Answering in Indic Languages Paper • 2502.01673 • Published Feb 1, 2025 • 2
A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings Paper • 2505.12116 • Published May 17, 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published May 17, 2025 • 10