| Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | Mar 27, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 |
| Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators | Mar 25, 2025 | Math | —Unverified | 0 |
| Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Mar 25, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| 1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training | Mar 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Gemma 3 Technical Report | Mar 25, 2025 | Instruction FollowingMath | —Unverified | 0 |
| Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning | Mar 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling | Mar 24, 2025 | Continual PretrainingLanguage Modeling | —Unverified | 0 |
| Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels | Mar 24, 2025 | Math | —Unverified | 0 |
| MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection | Mar 23, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |