Mathematical Problem-Solving

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 106 papers

Title	Date	Tasks	Status	Hype	Score
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning	Jun 5, 2025	Dataset GenerationMathematical Problem-Solving	CodeCode Available	1	5
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets	May 29, 2023	Bias DetectionCode Generation	CodeCode Available	1	5
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1	5
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning	Jun 10, 2025	Knowledge DistillationMath	CodeCode Available	1	5
Training and Evaluating Language Models with Template-based Data Generation	Nov 27, 2024	Data AugmentationMath	CodeCode Available	1	5
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models	Feb 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion	Mar 20, 2025	Data AugmentationMathematical Problem-Solving	CodeCode Available	1	5
Solving Inequality Proofs with Large Language Models	Jun 9, 2025	Mathematical Problem-SolvingRelation Prediction	CodeCode Available	1	5
VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models	Jan 9, 2025	BenchmarkingMathematical Problem-Solving	CodeCode Available	1	5
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions	May 29, 2024	BenchmarkingDialogue Understanding	CodeCode Available	1	5
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula	Jul 1, 2024	Mathematical Problem-Solving	CodeCode Available	1	5
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1	5
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision	May 26, 2025	HallucinationMath	CodeCode Available	0	5
Benchmarking Large Language Models for Math Reasoning Tasks	Aug 20, 2024	BenchmarkingIn-Context Learning	CodeCode Available	0	5
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation	Jun 8, 2025	Code GenerationMathematical Problem-Solving	CodeCode Available	0	5
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study	Mar 21, 2025	AttributeMathematical Problem-Solving	CodeCode Available	0	5
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?	May 28, 2025	MathMathematical Problem-Solving	CodeCode Available	0	5
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	May 14, 2025	MathMathematical Problem-Solving	CodeCode Available	0	5
Data Contamination Through the Lens of Time	Oct 16, 2023	Mathematical Problem-Solving	CodeCode Available	0	5
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving	Oct 19, 2023	GSM8KMath	CodeCode Available	0	5
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks	Apr 19, 2024	Mathematical Problem-Solving	CodeCode Available	0	5
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class	May 17, 2025	MathMathematical Problem-Solving	CodeCode Available	0	5
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory	Jun 18, 2024	Code GenerationMathematical Problem-Solving	CodeCode Available	0	5
Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange	Mar 30, 2024	MathMathematical Problem-Solving	CodeCode Available	0	5
A Survey on Mathematical Reasoning and Optimization with Large Language Models	Mar 22, 2025	Automated Theorem ProvingHeuristic Search	CodeCode Available	0	5

Show:10 25 50

← PrevPage 2 of 5Next →

No leaderboard results yet.