| An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning | Mar 4, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models | Mar 10, 2025 | Binary ClassificationHallucination | CodeCode Available | 0 |
| Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models | Apr 22, 2024 | DecoderMathematical Reasoning | CodeCode Available | 0 |
| Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models | Jun 18, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| OmniRouter: Budget and Performance Controllable Multi-LLM Routing | Feb 27, 2025 | AI AgentMathematical Reasoning | CodeCode Available | 0 |
| The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer | Feb 21, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition | Apr 10, 2024 | Code GenerationMathematical Reasoning | CodeCode Available | 0 |
| Analysing Mathematical Reasoning Abilities of Neural Models | Apr 2, 2019 | Mathematical Question AnsweringMathematical Reasoning | CodeCode Available | 0 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Aug 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 |
| MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models | Oct 19, 2023 | HallucinationMathematical Reasoning | CodeCode Available | 0 |
| Scaling up the think-aloud method | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems | Sep 30, 2024 | GSM8KMath | CodeCode Available | 0 |
| Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | Mar 23, 2025 | GSM8KMath | CodeCode Available | 0 |
| SCOPE: Compress Mathematical Reasoning Steps for Efficient Automated Process Annotation | May 20, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning | Jun 5, 2025 | Mathematical Reasoningreinforcement-learning | CodeCode Available | 0 |
| LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges | May 24, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 0 |
| LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs | Jun 7, 2024 | Mathematical ReasoningMultiple-choice | CodeCode Available | 0 |
| Bridging the Reasoning Gap: Small LLMs Can Plan with Generalised Strategies | Jan 31, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes | Feb 23, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | Jun 10, 2025 | Mathematical ReasoningVisual Reasoning | CodeCode Available | 0 |
| Blank Collapse: Compressing CTC emission for the faster decoding | Oct 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| LLM2: Let Large Language Models Harness System 2 Reasoning | Dec 29, 2024 | GSM8KMathematical Reasoning | CodeCode Available | 0 |