| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 | 5 |
| Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4 | Sep 9, 2024 | Abstract AlgebraAutomated Theorem Proving | CodeCode Available | 0 | 5 |
| Reasoning with Transformer-based Models: Deep Learning, but Shallow Reasoning | Jun 22, 2021 | Deep LearningLogical Reasoning | CodeCode Available | 0 | 5 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 | 5 |
| Reverse Operation based Data Augmentation for Solving Math Word Problems | Oct 4, 2020 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| A Survey of Deep Learning for Geometry Problem Solving | Jul 16, 2025 | Deep LearningGeometry Problem Solving | CodeCode Available | 0 | 5 |
| Reasoning over Uncertain Text by Generative Large Language Models | Feb 14, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 0 | 5 |
| Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting | Feb 9, 2023 | Mathematical ReasoningNatural Language Inference | CodeCode Available | 0 | 5 |
| Probability-Consistent Preference Optimization for Enhanced LLM Reasoning | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 | 5 |
| Planning and Editing What You Retrieve for Enhanced Tool Learning | Mar 30, 2024 | Mathematical ReasoningRetrieval | CodeCode Available | 0 | 5 |
| Process-based Self-Rewarding Language Models | Mar 5, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement | Feb 18, 2024 | Mathematical ReasoningText Generation | CodeCode Available | 0 | 5 |
| Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Apr 17, 2024 | FormLanguage Model Evaluation | CodeCode Available | 0 | 5 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction | Jun 2, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 | 5 |
| NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Jun 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Can A Gamer Train A Mathematical Reasoning Model? | Jun 10, 2025 | GPUMathematical Reasoning | CodeCode Available | 0 | 5 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 | 5 |