| MathAttack: Attacking Large Language Models Towards Math Solving Ability | Sep 4, 2023 | Adversarial AttackGSM8K | —Unverified | 0 |
| No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function | Sep 1, 2023 | GSM8KMathematical Reasoning | —Unverified | 0 |
| AskIt: Unified Programming Interface for Programming with Large Language Models | Aug 29, 2023 | Code GenerationFew-Shot Learning | CodeCode Available | 1 |
| Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning | Aug 21, 2023 | GSM8K | CodeCode Available | 0 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | Aug 3, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | Aug 1, 2023 | GSM8KMath | CodeCode Available | 1 |
| A mixed policy to improve performance of language models on math problems | Jul 17, 2023 | GSM8KMath | CodeCode Available | 0 |
| DiversiGATE: A Comprehensive Framework for Reliable Large Language Models | Jun 22, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Interpretable Math Word Problem Solution Generation Via Step-by-step Planning | Jun 1, 2023 | GSM8KLanguage Modeling | —Unverified | 0 |
| Matrix Information Theory for Self-Supervised Learning | May 27, 2023 | Contrastive LearningGSM8K | CodeCode Available | 1 |
| Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models | May 26, 2023 | GSM8KMultimodal Reasoning | CodeCode Available | 3 |
| GRACE: Discriminator-Guided Chain-of-Thought Reasoning | May 24, 2023 | GSM8KMath | CodeCode Available | 1 |
| Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems | May 24, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 0 |
| Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement | May 23, 2023 | GSM8K | CodeCode Available | 1 |
| PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 0 |
| Automatic Model Selection with Large Language Models for Reasoning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Self-Evaluation Guided Beam Search for Reasoning | May 1, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Progressive-Hint Prompting Improves Reasoning in Large Language Models | Apr 19, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Solving Math Word Problems by Combining Language Models With Symbolic Solvers | Apr 16, 2023 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Boosted Prompt Ensembles for Large Language Models | Apr 12, 2023 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning | Jan 27, 2023 | Few-Shot LearningGSM8K | CodeCode Available | 1 |
| Teaching Small Language Models to Reason | Dec 16, 2022 | GSM8KKnowledge Distillation | —Unverified | 0 |
| Distilling Reasoning Capabilities into Smaller Language Models | Dec 1, 2022 | GSM8KKnowledge Distillation | CodeCode Available | 0 |
| Explicit Knowledge Transfer for Weakly-Supervised Code Generation | Nov 30, 2022 | Code GenerationFew-Shot Learning | —Unverified | 0 |
| Solving math word problems with process- and outcome-based feedback | Nov 25, 2022 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| Large Language Models Can Self-Improve | Oct 20, 2022 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Transcending Scaling Laws with 0.1% Extra Compute | Oct 20, 2022 | Arithmetic ReasoningCross-Lingual Question Answering | —Unverified | 0 |
| Language Models are Multilingual Chain-of-Thought Reasoners | Oct 6, 2022 | GSM8KMath | CodeCode Available | 2 |
| Complexity-Based Prompting for Multi-Step Reasoning | Oct 3, 2022 | Date UnderstandingGSM8K | —Unverified | 0 |
| Making Large Language Models Better Reasoners with Step-Aware Verifier | Jun 6, 2022 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions | May 28, 2022 | Arithmetic ReasoningEfficient Exploration | CodeCode Available | 1 |
| Large Language Models are Zero-Shot Reasoners | May 24, 2022 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 2 |
| Self-Consistency Improves Chain of Thought Reasoning in Language Models | Mar 21, 2022 | ARCArithmetic Reasoning | CodeCode Available | 1 |
| Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Jan 28, 2022 | Common Sense ReasoningGSM8K | CodeCode Available | 6 |
| Training Verifiers to Solve Math Word Problems | Oct 27, 2021 | GSM8KMath | CodeCode Available | 3 |