| S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | May 12, 2025 | GSM8KLarge Language Model | —Unverified | 0 | 0 |
| Uncovering Latent Chain of Thought Vectors in Language Models | Sep 21, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | Aug 28, 2024 | Data AugmentationGSM8K | —Unverified | 0 | 0 |
| Cool-Fusion: Fuse Large Language Models without Training | Jul 29, 2024 | Combinatorial OptimizationGSM8K | —Unverified | 0 | 0 |
| ControlMath: Controllable Data Generation Promotes Math Generalist Models | Sep 20, 2024 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | Jul 11, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 | 0 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 | 0 |
| Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | Jul 29, 2024 | GSM8KPrompt Engineering | —Unverified | 0 | 0 |
| Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment | Oct 23, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs | Dec 11, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning | Mar 6, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Complexity-Based Prompting for Multi-Step Reasoning | Oct 3, 2022 | Date UnderstandingGSM8K | —Unverified | 0 | 0 |
| Solving math word problems with process- and outcome-based feedback | Nov 25, 2022 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning | Oct 3, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning | Feb 26, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths | May 30, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Steering LLM Reasoning Through Bias-Only Adaptation | May 24, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Feb 1, 2025 | GPUGSM8K | —Unverified | 0 | 0 |
| Can Separators Improve Chain-of-Thought Prompting? | Feb 16, 2024 | 8kGSM8K | —Unverified | 0 | 0 |
| Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation | Sep 5, 2024 | GSM8K | —Unverified | 0 | 0 |
| Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation | May 29, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | Sep 10, 2024 | GSM8KMixture-of-Experts | —Unverified | 0 | 0 |
| Subtle Errors Matter: Preference Learning via Error-injected Self-editing | Oct 9, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 | 0 |