SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 276–300 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
MathPrompter: Mathematical Reasoning using Large Language Models	Mar 4, 2023	Arithmetic ReasoningMath	CodeCode Available	1	5
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers	Dec 7, 2023	MathMultiple-choice	CodeCode Available	1	5
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1	5
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning	Aug 16, 2024	MathMathematical Reasoning	CodeCode Available	1	5
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Feb 27, 2025	GSM8KMath	CodeCode Available	1	5
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes	Oct 22, 2024	GSM8KLanguage Modeling	CodeCode Available	1	5
FormulaNet: A Benchmark Dataset for Mathematical Formula Detection	Aug 29, 2022	Math	CodeCode Available	1	5
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation	Apr 15, 2025	MathQuantum Machine Learning	CodeCode Available	1	5
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations	Dec 14, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
Math Word Problem Solving with Explicit Numerical Values	Aug 1, 2021	MathMath Word Problem Solving	CodeCode Available	1	5
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level	Dec 31, 2021	Few-Shot LearningLanguage Modelling	CodeCode Available	1	5
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems	May 23, 2023	Language ModellingLarge Language Model	CodeCode Available	1	5
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start	May 28, 2025	MathMultimodal Reasoning	CodeCode Available	1	5
Expression Syntax Information Bottleneck for Math Word Problems	Oct 24, 2023	Math	CodeCode Available	1	5
Mathematical Capabilities of ChatGPT	Jan 31, 2023	Elementary MathematicsMath	CodeCode Available	1	5
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling	Jul 13, 2024	BenchmarkingMath	CodeCode Available	1	5
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1	5
MathChat: Converse to Tackle Challenging Math Problems with LLM Agents	Jun 2, 2023	Elementary MathematicsMath	CodeCode Available	1	5
MathGloss: Building mathematical glossaries from text	Nov 21, 2023	Math	CodeCode Available	1	5
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search	Sep 26, 2024	MathMathematical Problem-Solving	CodeCode Available	1	5
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1	5
EXAONE Deep: Reasoning Enhanced Language Models	Mar 16, 2025	Math	CodeCode Available	1	5
Explaining Datasets in Words: Statistical Models with Natural Language Parameters	Sep 13, 2024	ClusteringLanguage Modeling	CodeCode Available	1	5
An Early Evaluation of GPT-4V(ision)	Oct 25, 2023	Math	CodeCode Available	1	5
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective	Jun 22, 2025	In-Context LearningLarge Language Model	CodeCode Available	1	5

Show:10 25 50

← PrevPage 12 of 64Next →

No leaderboard results yet.