| Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning | Jun 12, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models | Feb 6, 2024 | Mathematical ReasoningVariable Selection | —Unverified | 0 | 0 |
| Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning | May 20, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| BitNet b1.58 2B4T Technical Report | Apr 16, 2025 | Computational EfficiencyCPU | —Unverified | 0 | 0 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 | 0 |
| Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Aug 28, 2024 | Knowledge DistillationLanguage Modelling | —Unverified | 0 | 0 |
| Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning | May 22, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics | Apr 1, 2025 | MathMathematical Problem-Solving | —Unverified | 0 | 0 |
| Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | Jun 6, 2024 | Arithmetic ReasoningCode Generation | —Unverified | 0 | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments | Dec 16, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | Oct 17, 2023 | Mathematical ReasoningSentiment Analysis | —Unverified | 0 | 0 |
| Can Large Language Models Invent Algorithms to Improve Themselves? | Oct 21, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning | May 21, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning | May 20, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 | 0 |
| Can Theoretical Physics Research Benefit from Language Agents? | Jun 6, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers | May 19, 2025 | In-Context LearningInstruction Following | —Unverified | 0 | 0 |
| Causal Inference with Large Language Model: A Survey | Sep 15, 2024 | Causal InferenceLanguage Modeling | —Unverified | 0 | 0 |
| CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning | Jan 21, 2025 | ClusteringMathematical Reasoning | —Unverified | 0 | 0 |
| Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | Jan 19, 2025 | Automated Theorem ProvingMath | —Unverified | 0 | 0 |
| CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities | Jan 13, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Channel Merging: Preserving Specialization for Merged Experts | Dec 18, 2024 | Code GenerationGPU | —Unverified | 0 | 0 |
| CLEAR: Contrasting Textual Feedback with Experts and Amateurs for Reasoning | Mar 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning | Jan 23, 2025 | AttributeMathematical Reasoning | —Unverified | 0 | 0 |
| CodeGemma: Open Code Models Based on Gemma | Jun 17, 2024 | Code CompletionMathematical Reasoning | —Unverified | 0 | 0 |
| CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning | Oct 3, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Composing Ensembles of Pre-trained Models via Iterative Consensus | Oct 20, 2022 | Arithmetic ReasoningImage Generation | —Unverified | 0 | 0 |
| Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting | Aug 18, 2024 | HumanEvalMathematical Reasoning | —Unverified | 0 | 0 |
| Conjectures, Tests and Proofs: An Overview of Theory Exploration | Sep 7, 2021 | Automated Theorem ProvingMathematical Reasoning | —Unverified | 0 | 0 |
| ControlMath: Controllable Data Generation Promotes Math Generalist Models | Sep 20, 2024 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs | Jul 8, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 | 0 |
| DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training | Apr 24, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data | May 23, 2024 | Automated Theorem ProvingMathematical Reasoning | —Unverified | 0 | 0 |
| Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training | Apr 22, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Diversity-Aware Policy Optimization for Large Language Model Reasoning | May 29, 2025 | DiversityLanguage Modeling | —Unverified | 0 | 0 |
| Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks | Oct 10, 2024 | 8kDiversity | —Unverified | 0 | 0 |
| Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology | Oct 19, 2024 | Logical ReasoningMath | —Unverified | 0 | 0 |
| Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation | May 24, 2025 | Mathematical ReasoningMultimodal Reasoning | —Unverified | 0 | 0 |
| Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models | May 27, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models | May 20, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| Dual Instruction Tuning with Large Language Models for Mathematical Reasoning | Mar 27, 2024 | Domain GeneralizationMathematical Reasoning | —Unverified | 0 | 0 |
| DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models | Oct 29, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning | May 22, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Efficient Long CoT Reasoning in Small Language Models | May 24, 2025 | Mathematical Reasoningvalid | —Unverified | 0 | 0 |
| Efficient Model-agnostic Alignment via Bayesian Persuasion | May 29, 2024 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| Efficient Tool Use with Chain-of-Abstraction Reasoning | Jan 30, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Eliciting Reasoning in Language Models with Cognitive Tools | Jun 13, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning | Oct 14, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory | Apr 18, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 | 0 |