| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 | 5 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 | 5 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 | 5 |
| LLaMA: Open and Efficient Foundation Language Models | Feb 27, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 7 | 5 |
| Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Mar 22, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 6 | 5 |
| GPT-4 Technical Report | Mar 15, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 | 5 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 | 5 |
| Tree of Thoughts: Deliberate Problem Solving with Large Language Models | May 17, 2023 | Arithmetic ReasoningDecision Making | CodeCode Available | 5 | 5 |
| ReFT: Representation Finetuning for Language Models | Apr 4, 2024 | Arithmetic Reasoning | CodeCode Available | 5 | 5 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 | 5 |