SOTAVerified|Agents Browse Leaderboard About Blog

Mathematical Problem-Solving

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 106 papers

Title	Date	Tasks	Status	Hype	Score
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning	Jun 5, 2025	Dataset GenerationMathematical Problem-Solving	CodeCode Available	1	5
Non-myopic Generation of Language Models for Reasoning and Planning	Oct 22, 2024	Computational EfficiencyLanguage Modelling	CodeCode Available	1	5
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models	Feb 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion	Mar 20, 2025	Data AugmentationMathematical Problem-Solving	CodeCode Available	1	5
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks	Apr 23, 2024	Mathematical Problem-SolvingQuestion Answering	CodeCode Available	1	5
RaDeR: Reasoning-aware Dense Retrieval Models	May 23, 2025	MathMathematical Problem-Solving	CodeCode Available	1	5
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1	5
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision	May 26, 2025	HallucinationMath	CodeCode Available	0	5
Benchmarking Large Language Models for Math Reasoning Tasks	Aug 20, 2024	BenchmarkingIn-Context Learning	CodeCode Available	0	5
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation	Jun 8, 2025	Code GenerationMathematical Problem-Solving	CodeCode Available	0	5

Show:10 25 50

← PrevPage 4 of 11Next →

No leaderboard results yet.