| Over-Reasoning and Redundant Calculation of Large Language Models | Jan 21, 2024 | GSM8KMath | CodeCode Available | 1 |
| Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning | Jan 19, 2024 | GSM8KMath | CodeCode Available | 1 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| ReFT: Reasoning with Reinforced Fine-Tuning | Jan 17, 2024 | GSM8KMath | CodeCode Available | 4 |
| Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination | Jan 16, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline | Jan 16, 2024 | GSM8KMath | CodeCode Available | 3 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting | Dec 18, 2023 | DiversityGSM8K | —Unverified | 0 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Training Chain-of-Thought via Latent-Variable Inference | Nov 28, 2023 | GSM8K | —Unverified | 0 |
| AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations | Nov 22, 2023 | Common Sense ReasoningGSM8K | CodeCode Available | 0 |
| Meta Prompting for AI Systems | Nov 20, 2023 | Data InteractionGSM8K | CodeCode Available | 2 |
| Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization | Nov 17, 2023 | ARCGSM8K | CodeCode Available | 1 |
| OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning | Nov 14, 2023 | GSM8KMath | —Unverified | 0 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks | Nov 14, 2023 | GSM8KMath | —Unverified | 0 |
| Let's Reinforce Step by Step | Nov 10, 2023 | GSM8KLogical Reasoning | —Unverified | 0 |
| Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models | Nov 10, 2023 | GSM8KMemorization | CodeCode Available | 1 |
| Prompt Engineering a Prompt Engineer | Nov 9, 2023 | counterfactualCounterfactual Reasoning | —Unverified | 0 |
| Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch | Nov 6, 2023 | DecoderGSM8K | CodeCode Available | 2 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| SkyMath: Technical Report | Oct 25, 2023 | GSM8KLanguage Modeling | CodeCode Available | 3 |
| SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving | Oct 19, 2023 | GSM8KMath | CodeCode Available | 0 |
| DavIR: Data Selection via Implicit Reward for Large Language Models | Oct 16, 2023 | Causal Language ModelingGSM8K | —Unverified | 0 |
| Let's reward step by step: Step-Level reward model as the Navigators for Reasoning | Oct 16, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| KwaiYiiMath: Technical Report | Oct 11, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models | Oct 10, 2023 | Code GenerationContinual Learning | CodeCode Available | 1 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models | Oct 9, 2023 | GSM8KIn-Context Learning | CodeCode Available | 5 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference | Oct 4, 2023 | BenchmarkingGPU | —Unverified | 0 |
| Large Language Models as Analogical Reasoners | Oct 3, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems | Oct 3, 2023 | GSM8KMath | CodeCode Available | 0 |
| Think before you speak: Training Language Models With Pause Tokens | Oct 3, 2023 | DecoderGSM8K | —Unverified | 0 |
| Adapting LLM Agents with Universal Feedback in Communication | Oct 1, 2023 | Decision MakingGSM8K | —Unverified | 0 |
| UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities | Sep 30, 2023 | Causal JudgmentGSM8K | —Unverified | 0 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Design of Chain-of-Thought in Math Problem Solving | Sep 20, 2023 | DiversityGSM8K | CodeCode Available | 1 |
| Baichuan 2: Open Large-scale Language Models | Sep 19, 2023 | Feature EngineeringGSM8K | CodeCode Available | 4 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 |
| EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning | Sep 16, 2023 | Date UnderstandingGSM8K | CodeCode Available | 0 |
| Exploring an LM to generate Prolog Predicates from Mathematics Questions | Sep 7, 2023 | GSM8KLanguage Modeling | —Unverified | 0 |
| Large Language Models as Optimizers | Sep 7, 2023 | GSM8K | CodeCode Available | 1 |