| Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Jan 28, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models | Jan 9, 2025 | BenchmarkingMathematical Problem-Solving | CodeCode Available | 1 |
| Efficiently Serving LLM Reasoning Programs with Certaindex | Dec 30, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 3 |
| Large Language Models for Mathematical Analysis | Dec 28, 2024 | Mathematical Problem-SolvingMathematical Reasoning | CodeCode Available | 0 |
| Training and Evaluating Language Models with Template-based Data Generation | Nov 27, 2024 | Data AugmentationMath | CodeCode Available | 1 |
| O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? | Nov 25, 2024 | HallucinationKnowledge Distillation | CodeCode Available | 7 |
| Kwai-STaR: Transform LLMs into State-Transition Reasoners | Nov 7, 2024 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Nov 4, 2024 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 5 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks | Oct 24, 2024 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 |
| Non-myopic Generation of Language Models for Reasoning and Planning | Oct 22, 2024 | Computational EfficiencyLanguage Modelling | CodeCode Available | 1 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |
| Can LLMs plan paths with extra hints from solvers? | Oct 7, 2024 | Mathematical Problem-SolvingProgram Synthesis | —Unverified | 0 |
| LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning | Oct 3, 2024 | Efficient ExplorationMathematical Problem-Solving | CodeCode Available | 5 |
| PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation | Oct 2, 2024 | Data AugmentationDiversity | —Unverified | 0 |
| BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search | Sep 26, 2024 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 |
| Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems | Aug 29, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine | Jul 11, 2024 | Contrastive LearningLanguage Modelling | CodeCode Available | 4 |
| MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula | Jul 1, 2024 | Mathematical Problem-Solving | CodeCode Available | 1 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Jun 25, 2024 | DiversityMath | CodeCode Available | 2 |
| DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | Jun 18, 2024 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory | Jun 18, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 |
| 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward | Jun 11, 2024 | Instruction FollowingMathematical Problem-Solving | —Unverified | 0 |
| OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step | Jun 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| The Buffer Mechanism for Multi-Step Information Reasoning in Language Models | May 24, 2024 | Mathematical Problem-Solving | —Unverified | 0 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 |
| Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions | Apr 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks | Apr 23, 2024 | Mathematical Problem-SolvingQuestion Answering | CodeCode Available | 1 |
| Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks | Apr 19, 2024 | Mathematical Problem-Solving | CodeCode Available | 0 |
| ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Apr 3, 2024 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models | Mar 26, 2024 | Code CompletionFew-Shot Learning | CodeCode Available | 3 |
| SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models | Mar 12, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Premise Order Matters in Reasoning with Large Language Models | Feb 14, 2024 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| Large Language Models for Mathematical Reasoning: Progresses and Challenges | Jan 31, 2024 | DiversityMath | —Unverified | 0 |
| G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model | Dec 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning | Oct 20, 2023 | Mathematical Problem-SolvingPosition | —Unverified | 0 |
| SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving | Oct 19, 2023 | GSM8KMath | CodeCode Available | 0 |
| Data Contamination Through the Lens of Time | Oct 16, 2023 | Mathematical Problem-Solving | CodeCode Available | 0 |
| The Consensus Game: Language Model Generation via Equilibrium Search | Oct 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving | Sep 29, 2023 | Arithmetic ReasoningComputational Efficiency | CodeCode Available | 3 |
| Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education | Sep 9, 2023 | ChatbotMathematical Problem-Solving | —Unverified | 0 |
| Bayesian artificial brain with ChatGPT | Aug 28, 2023 | Mathematical Problem-Solving | —Unverified | 0 |
| JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving | Jun 19, 2023 | In-Context LearningLanguage Modeling | —Unverified | 0 |