| Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | Jun 6, 2024 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| Improve Mathematical Reasoning in Language Models by Automated Process Supervision | Jun 5, 2024 | GSM8KMath | —Unverified | 0 |
| Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models | Jun 5, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Pre-trained Large Language Models Use Fourier Features to Compute Addition | Jun 5, 2024 | Mathematical Reasoning | —Unverified | 0 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | Jun 5, 2024 | Mathematical ReasoningNatural Language Inference | —Unverified | 0 |
| Process-Driven Autoformalization in Lean 4 | Jun 4, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data | Jun 4, 2024 | Mathematical ReasoningText Generation | —Unverified | 0 |
| Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction | Jun 2, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Efficient Model-agnostic Alignment via Bayesian Persuasion | May 29, 2024 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation | May 27, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making | May 25, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 1 |
| Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications | May 24, 2024 | Code GenerationLow-rank compression | —Unverified | 0 |
| VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks | May 24, 2024 | Mathematical ReasoningNatural Language Understanding | CodeCode Available | 1 |
| Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models | May 24, 2024 | Atari GamesMathematical Reasoning | CodeCode Available | 2 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data | May 23, 2024 | Automated Theorem ProvingMathematical Reasoning | —Unverified | 0 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 |
| Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning | May 22, 2024 | Mathematical ReasoningMultiple-choice | CodeCode Available | 1 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 |
| MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | May 20, 2024 | Continual PretrainingMathematical Reasoning | CodeCode Available | 3 |
| A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks | May 16, 2024 | Code GenerationDialogue Generation | —Unverified | 0 |
| MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | May 13, 2024 | Data AugmentationGSM8K | CodeCode Available | 3 |
| MathDivide: Improved mathematical reasoning by large language models | May 12, 2024 | GSM8KLogical Reasoning | —Unverified | 0 |
| Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement | May 10, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought | May 9, 2024 | HallucinationMath | —Unverified | 0 |
| VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context | May 8, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| AlphaMath Almost Zero: Process Supervision without Process | May 6, 2024 | Mathematical ReasoningMath Word Problem Solving | CodeCode Available | 3 |
| Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | May 5, 2024 | GSM8KMath | CodeCode Available | 2 |
| GOLD: Geometry Problem Solver with Natural Language Description | May 1, 2024 | Math | CodeCode Available | 1 |
| A Careful Examination of Large Language Model Performance on Grade School Arithmetic | May 1, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions | Apr 29, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Benchmarking Benchmark Leakage in Large Language Models | Apr 29, 2024 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training | Apr 22, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| PARAMANU-GANITA: Language Model with Mathematical Capabilities | Apr 22, 2024 | Domain AdaptationGSM8K | —Unverified | 0 |
| Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models | Apr 22, 2024 | DecoderMathematical Reasoning | CodeCode Available | 0 |
| iTBLS: A Dataset of Interactive Conversations Over Tabular Information | Apr 19, 2024 | ArticlesMathematical Reasoning | —Unverified | 0 |
| Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | Apr 18, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory | Apr 18, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Apr 17, 2024 | FormLanguage Model Evaluation | CodeCode Available | 0 |
| Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards | Apr 16, 2024 | GSM8KMath | CodeCode Available | 2 |
| Compression Represents Intelligence Linearly | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition | Apr 10, 2024 | Code GenerationMathematical Reasoning | CodeCode Available | 0 |
| Evaluating Mathematical Reasoning Beyond Accuracy | Apr 8, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models | Apr 5, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Exploring the Mystery of Influential Data for Mathematical Reasoning | Apr 1, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Planning and Editing What You Retrieve for Enhanced Tool Learning | Mar 30, 2024 | Mathematical ReasoningRetrieval | CodeCode Available | 0 |
| Dual Instruction Tuning with Large Language Models for Mathematical Reasoning | Mar 27, 2024 | Domain GeneralizationMathematical Reasoning | —Unverified | 0 |