| GOLD: Geometry Problem Solver with Natural Language Description | May 1, 2024 | Math | CodeCode Available | 1 |
| Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | Apr 18, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Stepwise Self-Consistent Mathematical Reasoning with Large Language Models | Feb 24, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models | Feb 20, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | Feb 18, 2024 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 1 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Question Translation Training for Better Multilingual Reasoning | Jan 15, 2024 | Mathematical ReasoningTranslation | CodeCode Available | 1 |
| MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization | Jan 12, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| An In-depth Look at Gemini's Language Abilities | Dec 18, 2023 | Instruction FollowingMath | CodeCode Available | 1 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models | Oct 10, 2023 | Code GenerationContinual Learning | CodeCode Available | 1 |
| Ada-Instruct: Adapting Instruction Generators for Complex Reasoning | Oct 6, 2023 | Code CompletionIn-Context Learning | CodeCode Available | 1 |
| SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training | Oct 3, 2023 | Contrastive LearningEquation Discovery | CodeCode Available | 1 |
| Auto-Regressive Next-Token Predictors are Universal Learners | Sep 13, 2023 | Mathematical ReasoningText Generation | CodeCode Available | 1 |
| Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch | Aug 23, 2023 | Mathematical Reasoning | CodeCode Available | 1 |
| Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation | Aug 16, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 |