| Probability-Consistent Preference Optimization for Enhanced LLM Reasoning | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt | May 29, 2025 | Mathematical Reasoning | —Unverified | 0 |
| AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning | May 29, 2025 | Geometry Problem SolvingMathematical Reasoning | —Unverified | 0 |
| Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | May 28, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | May 28, 2025 | Imitation LearningMath | CodeCode Available | 1 |
| Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models | May 27, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Reinforcing General Reasoning without Verifiers | May 27, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning | May 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles | May 26, 2025 | ARCLogical Reasoning | —Unverified | 0 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 |
| Improving Multilingual Math Reasoning for African Languages | May 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation | May 26, 2025 | Mathematical Reasoning | —Unverified | 0 |
| ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment | May 25, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling | May 25, 2025 | Computational EfficiencyMathematical Reasoning | CodeCode Available | 1 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs | May 25, 2025 | Machine TranslationMathematical Reasoning | CodeCode Available | 1 |
| SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking | May 25, 2025 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 0 |
| Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions | May 24, 2025 | Automated Theorem ProvingMath | CodeCode Available | 0 |
| LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges | May 24, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 0 |
| Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation | May 24, 2025 | Mathematical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Efficient Long CoT Reasoning in Small Language Models | May 24, 2025 | Mathematical Reasoningvalid | —Unverified | 0 |
| Unraveling Misinformation Propagation in LLM Reasoning | May 24, 2025 | Mathematical ReasoningMisinformation | CodeCode Available | 0 |
| PPT: A Process-based Preference Learning Framework for Self Improving Table Question Answering Models | May 23, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |