| S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | May 12, 2025 | GSM8KLarge Language Model | —Unverified | 0 | 0 |
| Uncovering Latent Chain of Thought Vectors in Language Models | Sep 21, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | Aug 28, 2024 | Data AugmentationGSM8K | —Unverified | 0 | 0 |
| Cool-Fusion: Fuse Large Language Models without Training | Jul 29, 2024 | Combinatorial OptimizationGSM8K | —Unverified | 0 | 0 |
| ControlMath: Controllable Data Generation Promotes Math Generalist Models | Sep 20, 2024 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | Jul 11, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 | 0 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 | 0 |
| Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | Jul 29, 2024 | GSM8KPrompt Engineering | —Unverified | 0 | 0 |
| Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment | Oct 23, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs | Dec 11, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning | Mar 6, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Complexity-Based Prompting for Multi-Step Reasoning | Oct 3, 2022 | Date UnderstandingGSM8K | —Unverified | 0 | 0 |
| Solving math word problems with process- and outcome-based feedback | Nov 25, 2022 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning | Oct 3, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning | Feb 26, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths | May 30, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Steering LLM Reasoning Through Bias-Only Adaptation | May 24, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Feb 1, 2025 | GPUGSM8K | —Unverified | 0 | 0 |
| Can Separators Improve Chain-of-Thought Prompting? | Feb 16, 2024 | 8kGSM8K | —Unverified | 0 | 0 |
| Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation | Sep 5, 2024 | GSM8K | —Unverified | 0 | 0 |
| Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation | May 29, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | Sep 10, 2024 | GSM8KMixture-of-Experts | —Unverified | 0 | 0 |
| Subtle Errors Matter: Preference Learning via Error-injected Self-editing | Oct 9, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| BrainTransformers: SNN-LLM | Oct 3, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Supervisory Prompt Training | Mar 26, 2024 | GSM8KSentence | —Unverified | 0 | 0 |
| Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Apr 4, 2025 | BenchmarkingGSM8K | —Unverified | 0 | 0 |
| SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning | Feb 20, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | Apr 7, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 | 0 |
| System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts | May 25, 2025 | GSM8K | —Unverified | 0 | 0 |
| System-2 Mathematical Reasoning via Enriched Instruction Tuning | Dec 22, 2024 | ERPGSM8K | —Unverified | 0 | 0 |
| BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation | Feb 3, 2025 | DiversityGSM8K | —Unverified | 0 | 0 |
| Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | Mar 18, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Teaching Small Language Models to Reason | Dec 16, 2022 | GSM8KKnowledge Distillation | —Unverified | 0 | 0 |
| Adaptive Decoding via Latent Preference Optimization | Nov 14, 2024 | GSM8KInstruction Following | —Unverified | 0 | 0 |
| Adapting LLM Agents with Universal Feedback in Communication | Oct 1, 2023 | Decision MakingGSM8K | —Unverified | 0 | 0 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 | 0 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| When is the consistent prediction likely to be a correct prediction? | Jul 8, 2024 | GSM8KPrediction | —Unverified | 0 | 0 |
| Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning | Feb 16, 2025 | GSM8K | —Unverified | 0 | 0 |
| The Role of Deductive and Inductive Reasoning in Large Language Models | Oct 3, 2024 | GSM8K | —Unverified | 0 | 0 |
| The Unreasonable Effectiveness of Eccentric Automatic Prompts | Feb 9, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| Think before you speak: Training Language Models With Pause Tokens | Oct 3, 2023 | DecoderGSM8K | —Unverified | 0 | 0 |
| Think Beyond Size: Adaptive Prompting for More Effective Reasoning | Oct 10, 2024 | Arithmetic ReasoningComputational Efficiency | —Unverified | 0 | 0 |
| Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers | Jun 5, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs | Aug 18, 2024 | DiversityGPU | —Unverified | 0 | 0 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |