| Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions | May 24, 2025 | Automated Theorem ProvingMath | CodeCode Available | 0 | 5 |
| Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models | Apr 22, 2024 | DecoderMathematical Reasoning | CodeCode Available | 0 | 5 |
| Reasoning with Transformer-based Models: Deep Learning, but Shallow Reasoning | Jun 22, 2021 | Deep LearningLogical Reasoning | CodeCode Available | 0 | 5 |
| RoMath: A Mathematical Reasoning Benchmark in Romanian | Sep 17, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | Jun 6, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 | 5 |
| Bridging the Reasoning Gap: Small LLMs Can Plan with Generalised Strategies | Jan 31, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Agentic-R1: Distilled Dual-Strategy Reasoning | Jul 8, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes | Feb 23, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| SWI: Speaking with Intent in Large Language Models | Mar 27, 2025 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 | 5 |
| Position: AI Evaluation Should Learn from How We Test Humans | Jun 18, 2023 | Mathematical ReasoningPosition | CodeCode Available | 0 | 5 |
| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 | 5 |
| Planning and Editing What You Retrieve for Enhanced Tool Learning | Mar 30, 2024 | Mathematical ReasoningRetrieval | CodeCode Available | 0 | 5 |
| OmniRouter: Budget and Performance Controllable Multi-LLM Routing | Feb 27, 2025 | AI AgentMathematical Reasoning | CodeCode Available | 0 | 5 |
| Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement | Feb 18, 2024 | Mathematical ReasoningText Generation | CodeCode Available | 0 | 5 |
| Blank Collapse: Compressing CTC emission for the faster decoding | Oct 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Reasoning over Uncertain Text by Generative Large Language Models | Feb 14, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 0 | 5 |
| Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic | Nov 3, 2022 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 0 | 5 |
| Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think | Apr 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 | 5 |
| Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Apr 17, 2024 | FormLanguage Model Evaluation | CodeCode Available | 0 | 5 |
| Probability-Consistent Preference Optimization for Enhanced LLM Reasoning | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Scaling Reasoning can Improve Factuality in Large Language Models | May 16, 2025 | Knowledge GraphsLarge Language Model | CodeCode Available | 0 | 5 |
| Do LLM Evaluators Prefer Themselves for a Reason? | Apr 4, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 | 5 |
| Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Nov 27, 2024 | In-Context LearningMath | CodeCode Available | 0 | 5 |
| NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Jun 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 | 5 |
| Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English | Dec 24, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning | Jun 12, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 0 | 5 |
| Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | Dec 22, 2024 | Decision MakingMachine Translation | CodeCode Available | 0 | 5 |
| KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Feb 6, 2025 | Mathematical ReasoningQuantization | CodeCode Available | 0 | 5 |
| Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence | Mar 26, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| MultiLingPoT: Enhancing Mathematical Reasoning with Multilingual Program Fine-tuning | Dec 17, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| On-Policy RL with Optimal Reward Baseline | May 29, 2025 | Large Language ModelMathematical Reasoning | CodeCode Available | 0 | 5 |
| MoD: A Distribution-Based Approach for Merging Large Language Models | Nov 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Compositional Generalization with Tree Stack Memory Units | Nov 5, 2019 | Mathematical ReasoningZero-shot Generalization | CodeCode Available | 0 | 5 |
| MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | May 19, 2025 | DecoderImage Generation | CodeCode Available | 0 | 5 |
| Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models | Nov 19, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | Jan 21, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 | 5 |
| Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | May 28, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 | 5 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation | Jul 2, 2024 | Code GenerationForm | CodeCode Available | 0 | 5 |
| Large Language Models for Mathematical Analysis | Dec 28, 2024 | Mathematical Problem-SolvingMathematical Reasoning | CodeCode Available | 0 | 5 |
| Instructing Large Language Models to Identify and Ignore Irrelevant Conditions | Mar 19, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Math Word Problem Solving by Generating Linguistic Variants of Problem Statements | Jun 24, 2023 | DecoderIngenuity | CodeCode Available | 0 | 5 |