| KwaiYiiMath: Technical Report | Oct 11, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models | Oct 8, 2023 | Arithmetic Reasoning | CodeCode Available | 1 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| DOMINO: A Dual-System for Multi-step Visual Language Reasoning | Oct 4, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving | Sep 29, 2023 | Arithmetic ReasoningComputational Efficiency | CodeCode Available | 3 |
| Are Human-generated Demonstrations Necessary for In-context Learning? | Sep 26, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| OpenChat: Advancing Open-source Language Models with Mixed-Quality Data | Sep 20, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL | Sep 13, 2023 | Arithmetic ReasoningNavigate | CodeCode Available | 1 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | Aug 15, 2023 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| Token-Scaled Logit Distillation for Ternary Weight Generative Language Models | Aug 13, 2023 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 1 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | Aug 3, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 |
| Model Card and Evaluations for Claude Models | Jul 11, 2023 | Arithmetic ReasoningBug fixing | —Unverified | 0 |
| On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes | Jun 23, 2023 | Arithmetic ReasoningKnowledge Distillation | —Unverified | 0 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |
| DiversiGATE: A Comprehensive Framework for Reliable Large Language Models | Jun 22, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Boosting Language Models Reasoning with Chain-of-Knowledge Prompting | Jun 10, 2023 | Arithmetic Reasoning | CodeCode Available | 1 |
| Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models | Jun 6, 2023 | Arithmetic ReasoningIn-Context Learning | CodeCode Available | 0 |
| Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | May 30, 2023 | Arithmetic ReasoningMachine Translation | CodeCode Available | 2 |
| Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models | May 29, 2023 | Arithmetic Reasoning | —Unverified | 0 |
| Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems | May 24, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 0 |
| A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis | May 24, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 1 |
| PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 0 |
| Automatic Model Selection with Large Language Models for Reasoning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Tree of Thoughts: Deliberate Problem Solving with Large Language Models | May 17, 2023 | Arithmetic ReasoningDecision Making | CodeCode Available | 5 |
| SatLM: Satisfiability-Aided Language Models Using Declarative Prompting | May 16, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency | May 14, 2023 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| CodeT5+: Open Code Large Language Models for Code Understanding and Generation | May 13, 2023 | Arithmetic ReasoningCode Completion | CodeCode Available | 0 |
| Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting | May 11, 2023 | AllArithmetic Reasoning | CodeCode Available | 1 |
| MoT: Memory-of-Thought Enables ChatGPT to Self-Improve | May 9, 2023 | Arithmetic ReasoningNatural Language Inference | CodeCode Available | 1 |
| Self-Evaluation Guided Beam Search for Reasoning | May 1, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Progressive-Hint Prompting Improves Reasoning in Large Language Models | Apr 19, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| When do you need Chain-of-Thought Prompting for ChatGPT? | Apr 6, 2023 | Arithmetic ReasoningMemorization | —Unverified | 0 |
| LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models | Apr 4, 2023 | Arithmetic ReasoningLanguage Modelling | CodeCode Available | 3 |
| Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks | Apr 4, 2023 | Arithmetic ReasoningLanguage Modelling | CodeCode Available | 1 |
| Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Mar 22, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 6 |
| GPT-4 Technical Report | Mar 15, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| MathPrompter: Mathematical Reasoning using Large Language Models | Mar 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| LLaMA: Open and Efficient Foundation Language Models | Feb 27, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 7 |
| Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data | Feb 24, 2023 | Arithmetic ReasoningLanguage Modelling | CodeCode Available | 1 |
| LEVER: Learning to Verify Language-to-Code Generation with Execution | Feb 16, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Do Deep Neural Networks Capture Compositionality in Arithmetic Reasoning? | Feb 15, 2023 | Arithmetic Reasoning | CodeCode Available | 0 |
| Is ChatGPT a General-Purpose Natural Language Processing Task Solver? | Feb 8, 2023 | Arithmetic ReasoningZero-Shot Learning | CodeCode Available | 2 |