Mathematical Problem-Solving

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 106 papers

Title	Date	Tasks	Status	Hype
EvoAgentX: An Automated Framework for Evolving Agentic Workflows	Jul 4, 2025	Code GenerationMath	CodeCode Available	7
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?	Nov 25, 2024	HallucinationKnowledge Distillation	CodeCode Available	7
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Nov 4, 2024	Logical ReasoningMathematical Problem-Solving	CodeCode Available	5
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning	Oct 3, 2024	Efficient ExplorationMathematical Problem-Solving	CodeCode Available	5
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine	Jul 11, 2024	Contrastive LearningLanguage Modelling	CodeCode Available	4
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model	Dec 18, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
Efficiently Serving LLM Reasoning Programs with Certaindex	Dec 30, 2024	Code GenerationMathematical Problem-Solving	CodeCode Available	3
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models	Mar 26, 2024	Code CompletionFew-Shot Learning	CodeCode Available	3
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving	Sep 29, 2023	Arithmetic ReasoningComputational Efficiency	CodeCode Available	3
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving	May 12, 2025	MathMathematical Problem-Solving	CodeCode Available	2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation	Feb 26, 2025	Code GenerationHumanEval	CodeCode Available	2
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Feb 7, 2025	Mathematical Problem-Solvingreinforcement-learning	CodeCode Available	2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data	Jun 26, 2024	BenchmarkingMath	CodeCode Available	2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models	Jun 25, 2024	DiversityMath	CodeCode Available	2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	Jun 18, 2024	Arithmetic ReasoningMath	CodeCode Available	2
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline	Apr 3, 2024	MathMathematical Problem-Solving	CodeCode Available	2
Measuring Mathematical Problem Solving With the MATH Dataset	Mar 5, 2021	MathMathematical Problem-Solving	CodeCode Available	2
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning	Jun 10, 2025	Knowledge DistillationMath	CodeCode Available	1
Solving Inequality Proofs with Large Language Models	Jun 9, 2025	Mathematical Problem-SolvingRelation Prediction	CodeCode Available	1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning	Jun 5, 2025	Dataset GenerationMathematical Problem-Solving	CodeCode Available	1
RaDeR: Reasoning-aware Dense Retrieval Models	May 23, 2025	MathMathematical Problem-Solving	CodeCode Available	1
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion	Mar 20, 2025	Data AugmentationMathematical Problem-Solving	CodeCode Available	1
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models	Feb 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs	Jan 11, 2025	MathMathematical Problem-Solving	CodeCode Available	1
VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models	Jan 9, 2025	BenchmarkingMathematical Problem-Solving	CodeCode Available	1
Training and Evaluating Language Models with Template-based Data Generation	Nov 27, 2024	Data AugmentationMath	CodeCode Available	1
Non-myopic Generation of Language Models for Reasoning and Planning	Oct 22, 2024	Computational EfficiencyLanguage Modelling	CodeCode Available	1
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search	Sep 26, 2024	MathMathematical Problem-Solving	CodeCode Available	1
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula	Jul 1, 2024	Mathematical Problem-Solving	CodeCode Available	1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions	May 29, 2024	BenchmarkingDialogue Understanding	CodeCode Available	1
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks	Apr 23, 2024	Mathematical Problem-SolvingQuestion Answering	CodeCode Available	1
Evaluating Language Models for Mathematics through Interactions	Jun 2, 2023	Language ModellingMathematical Problem-Solving	CodeCode Available	1
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets	May 29, 2023	Bias DetectionCode Generation	CodeCode Available	1
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers	Apr 1, 2023	Inductive BiasMathematical Problem-Solving	CodeCode Available	1
LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning	Jun 16, 2025	Code GenerationMathematical Problem-Solving	CodeCode Available	0
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving	Jun 12, 2025	Logical ReasoningMathematical Problem-Solving	—Unverified	0
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation	Jun 8, 2025	Code GenerationMathematical Problem-Solving	CodeCode Available	0
PoLAR: Polar-Decomposed Low-Rank Adapter Representation	Jun 3, 2025	Mathematical Problem-SolvingRiemannian optimization	—Unverified	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?	May 28, 2025	MathMathematical Problem-Solving	CodeCode Available	0
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision	May 26, 2025	HallucinationMath	CodeCode Available	0
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers	May 26, 2025	Logical ReasoningMathematical Problem-Solving	CodeCode Available	0
Can reasoning models comprehend mathematical problems in Chinese ancient texts? An empirical study based on data from Suanjing Shishu	May 22, 2025	Mathematical Problem-Solving	—Unverified	0
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving	May 22, 2025	DiagnosticMathematical Problem-Solving	—Unverified	0
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems	May 21, 2025	BenchmarkingMath	—Unverified	0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class	May 17, 2025	MathMathematical Problem-Solving	CodeCode Available	0
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs	May 16, 2025	Mathematical Problem-SolvingReinforcement Learning (RL)	—Unverified	0

Show:10 25 50

← PrevPage 1 of 3Next →

No leaderboard results yet.