| KwaiYiiMath: Technical Report | Oct 11, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models | Oct 8, 2023 | Arithmetic Reasoning | CodeCode Available | 1 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| DOMINO: A Dual-System for Multi-step Visual Language Reasoning | Oct 4, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving | Sep 29, 2023 | Arithmetic ReasoningComputational Efficiency | CodeCode Available | 3 |
| Are Human-generated Demonstrations Necessary for In-context Learning? | Sep 26, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| OpenChat: Advancing Open-source Language Models with Mixed-Quality Data | Sep 20, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL | Sep 13, 2023 | Arithmetic ReasoningNavigate | CodeCode Available | 1 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | Aug 15, 2023 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| Token-Scaled Logit Distillation for Ternary Weight Generative Language Models | Aug 13, 2023 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 1 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | Aug 3, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 |
| Model Card and Evaluations for Claude Models | Jul 11, 2023 | Arithmetic ReasoningBug fixing | —Unverified | 0 |
| On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes | Jun 23, 2023 | Arithmetic ReasoningKnowledge Distillation | —Unverified | 0 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |
| DiversiGATE: A Comprehensive Framework for Reliable Large Language Models | Jun 22, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Boosting Language Models Reasoning with Chain-of-Knowledge Prompting | Jun 10, 2023 | Arithmetic Reasoning | CodeCode Available | 1 |
| Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models | Jun 6, 2023 | Arithmetic ReasoningIn-Context Learning | CodeCode Available | 0 |
| Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | May 30, 2023 | Arithmetic ReasoningMachine Translation | CodeCode Available | 2 |
| Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models | May 29, 2023 | Arithmetic Reasoning | —Unverified | 0 |