| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 | 5 |
| Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning | Feb 8, 2024 | GSM8Kreinforcement-learning | CodeCode Available | 2 | 5 |
| Synthetic Data RL: Task Definition Is All You Need | May 18, 2025 | AllGSM8K | CodeCode Available | 2 | 5 |
| How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | Dec 4, 2024 | GSM8K | CodeCode Available | 2 | 5 |
| Meta Prompting for AI Systems | Nov 20, 2023 | Data InteractionGSM8K | CodeCode Available | 2 | 5 |
| VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment | Oct 2, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese | Jan 22, 2024 | DiversityGSM8K | CodeCode Available | 2 | 5 |
| Natural Language Fine-Tuning | Dec 29, 2024 | GSM8KLarge Language Model | CodeCode Available | 2 | 5 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| SIFT: Grounding LLM Reasoning in Contexts via Stickers | Feb 19, 2025 | GSM8KMath | CodeCode Available | 2 | 5 |
| Weak-to-Strong Reasoning | Jul 18, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | Aug 3, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | May 5, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| any4: Learned 4-bit Numeric Representation for LLMs | Jul 7, 2025 | GPUGSM8K | CodeCode Available | 2 | 5 |
| Balancing LoRA Performance and Efficiency with Simple Shard Sharing | Sep 19, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 2 | 5 |
| Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Feb 24, 2025 | GSM8KMath | CodeCode Available | 2 | 5 |
| SEAL: Steerable Reasoning Calibration of Large Language Models for Free | Apr 7, 2025 | GSM8K | CodeCode Available | 2 | 5 |
| Progressive-Hint Prompting Improves Reasoning in Large Language Models | Apr 19, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 | 5 |
| LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | Mar 22, 2024 | Data AugmentationGSM8K | CodeCode Available | 2 | 5 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 | 5 |
| Reformatted Alignment | Feb 19, 2024 | GSM8KHallucination | CodeCode Available | 2 | 5 |
| ProcessBench: Identifying Process Errors in Mathematical Reasoning | Dec 9, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization | Oct 11, 2024 | GSM8KLanguage Modeling | CodeCode Available | 2 | 5 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 | 5 |
| CoT-Valve: Length-Compressible Chain-of-Thought Tuning | Feb 13, 2025 | GSM8K | CodeCode Available | 2 | 5 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| Preference Optimization for Reasoning with Pseudo Feedback | Nov 25, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | May 19, 2025 | GSM8KMath | CodeCode Available | 2 | 5 |
| Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions | May 28, 2022 | Arithmetic ReasoningEfficient Exploration | CodeCode Available | 1 | 5 |
| Learning Goal-Conditioned Representations for Language Reward Models | Jul 18, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 | 5 |
| OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Automatic Model Selection with Large Language Models for Reasoning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 | 5 |
| CommVQ: Commutative Vector Quantization for KV Cache Compression | Jun 23, 2025 | GPUGSM8K | CodeCode Available | 1 | 5 |
| Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries | Dec 12, 2024 | 4kGSM8K | CodeCode Available | 1 | 5 |
| Over-Reasoning and Redundant Calculation of Large Language Models | Jan 21, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning | Oct 8, 2024 | GSM8KMulti-agent Reinforcement Learning | CodeCode Available | 1 | 5 |
| Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning | Jan 27, 2023 | Few-Shot LearningGSM8K | CodeCode Available | 1 | 5 |
| Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks | Sep 20, 2024 | ARCGSM8K | CodeCode Available | 1 | 5 |
| Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Language Models as Science Tutors | Feb 16, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Large Language Models as Optimizers | Sep 7, 2023 | GSM8K | CodeCode Available | 1 | 5 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 | 5 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Large Language Models are Contrastive Reasoners | Mar 13, 2024 | GSM8K | CodeCode Available | 1 | 5 |
| AskIt: Unified Programming Interface for Programming with Large Language Models | Aug 29, 2023 | Code GenerationFew-Shot Learning | CodeCode Available | 1 | 5 |