| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 |
| LLaMA: Open and Efficient Foundation Language Models | Feb 27, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 7 |
| Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Mar 22, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 6 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| GPT-4 Technical Report | Mar 15, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| Tree of Thoughts: Deliberate Problem Solving with Large Language Models | May 17, 2023 | Arithmetic ReasoningDecision Making | CodeCode Available | 5 |
| ReFT: Representation Finetuning for Language Models | Apr 4, 2024 | Arithmetic Reasoning | CodeCode Available | 5 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 |
| OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | Oct 2, 2024 | Arithmetic ReasoningLarge Language Model | CodeCode Available | 4 |
| OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | Feb 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 4 |
| Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | Jun 26, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks | Jul 7, 2024 | Arithmetic Reasoning | CodeCode Available | 3 |
| ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving | Sep 29, 2023 | Arithmetic ReasoningComputational Efficiency | CodeCode Available | 3 |
| Llemma: An Open Language Model For Mathematics | Oct 16, 2023 | Arithmetic ReasoningAutomated Theorem Proving | CodeCode Available | 3 |
| LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models | Apr 4, 2023 | Arithmetic ReasoningLanguage Modelling | CodeCode Available | 3 |
| Reasoning with Language Model Prompting: A Survey | Dec 19, 2022 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 3 |
| Progressive-Hint Prompting Improves Reasoning in Large Language Models | Apr 19, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks | Jan 5, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 2 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | Feb 23, 2024 | Arithmetic ReasoningAutomated Theorem Proving | CodeCode Available | 2 |
| Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs | Jun 13, 2024 | Arithmetic ReasoningFact Verification | CodeCode Available | 2 |
| DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | Jun 18, 2024 | Arithmetic ReasoningMath | CodeCode Available | 2 |