| MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline | Jan 16, 2024 | GSM8KMath | CodeCode Available | 3 |
| SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models | Jan 15, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| Question Translation Training for Better Multilingual Reasoning | Jan 15, 2024 | Mathematical ReasoningTranslation | CodeCode Available | 1 |
| CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities | Jan 13, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization | Jan 12, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs | Dec 29, 2023 | Mathematical Reasoning | —Unverified | 0 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments | Dec 26, 2023 | Knowledge DistillationMathematical Reasoning | —Unverified | 0 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning | Dec 19, 2023 | Mathematical Reasoning | —Unverified | 0 |
| From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting | Dec 18, 2023 | DiversityGSM8K | —Unverified | 0 |
| An In-depth Look at Gemini's Language Abilities | Dec 18, 2023 | Instruction FollowingMath | CodeCode Available | 1 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Assessing GPT4-V on Structured Reasoning Tasks | Dec 13, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 |
| Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning | Dec 9, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 0 |
| Universal Self-Consistency for Large Language Model Generation | Nov 29, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 |
| LANS: A Layout-Aware Neural Solver for Plane Geometry Problem | Nov 25, 2023 | Geometry Problem SolvingLanguage Modelling | —Unverified | 0 |
| AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations | Nov 22, 2023 | Common Sense ReasoningGSM8K | CodeCode Available | 0 |
| Orca 2: Teaching Small Language Models How to Reason | Nov 18, 2023 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning | Nov 14, 2023 | GSM8KMath | —Unverified | 0 |
| VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency | Nov 13, 2023 | MathMathematical Reasoning | CodeCode Available | 0 |
| Let's Reinforce Step by Step | Nov 10, 2023 | GSM8KLogical Reasoning | —Unverified | 0 |
| ATHENA: Mathematical Reasoning with Thought Expansion | Nov 2, 2023 | MathMathematical Reasoning | CodeCode Available | 0 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories | Oct 25, 2023 | Automated Theorem ProvingLanguage Modeling | —Unverified | 0 |
| SkyMath: Technical Report | Oct 25, 2023 | GSM8KLanguage Modeling | CodeCode Available | 3 |
| MCC-KD: Multi-CoT Consistent Knowledge Distillation | Oct 23, 2023 | DiversityKnowledge Distillation | CodeCode Available | 0 |
| MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models | Oct 19, 2023 | HallucinationMathematical Reasoning | CodeCode Available | 0 |
| Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | Oct 17, 2023 | Mathematical ReasoningSentiment Analysis | —Unverified | 0 |
| DavIR: Data Selection via Implicit Reward for Large Language Models | Oct 16, 2023 | Causal Language ModelingGSM8K | —Unverified | 0 |
| TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models | Oct 16, 2023 | Automated Theorem ProvingBenchmarking | CodeCode Available | 0 |
| Let's reward step by step: Step-Level reward model as the Navigators for Reasoning | Oct 16, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| An Expression Tree Decoding Strategy for Mathematical Equation Generation | Oct 14, 2023 | MathMathematical Reasoning | CodeCode Available | 2 |
| KwaiYiiMath: Technical Report | Oct 11, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models | Oct 10, 2023 | Code GenerationContinual Learning | CodeCode Available | 1 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition | Oct 9, 2023 | Code GenerationInstruction Following | CodeCode Available | 3 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation | Oct 6, 2023 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Ada-Instruct: Adapting Instruction Generators for Complex Reasoning | Oct 6, 2023 | Code CompletionIn-Context Learning | CodeCode Available | 1 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Notes on a Path to AI Assistance in Mathematical Reasoning | Oct 4, 2023 | Mathematical Reasoning | —Unverified | 0 |
| Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions | Oct 3, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training | Oct 3, 2023 | Contrastive LearningEquation Discovery | CodeCode Available | 1 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 |