SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts	Oct 3, 2023	ChatbotImage Captioning	CodeCode Available	2	5
RM-R1: Reward Modeling as Reasoning	May 5, 2025	MathReinforcement Learning (RL)	CodeCode Available	2	5
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark	May 20, 2024	College MathematicsGSM8K	CodeCode Available	2	5
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code	Oct 10, 2024	MathMathematical Reasoning	CodeCode Available	2	5
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2	5
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models	May 15, 2025	Mathreinforcement-learning	CodeCode Available	2	5
Full Page Handwriting Recognition via Image to Sequence Extraction	Mar 11, 2021	Handwriting RecognitionHandwritten Text Recognition	CodeCode Available	2	5
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning	Oct 5, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2	5
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems	Apr 6, 2024	Logical ReasoningMath	CodeCode Available	2	5
Agent Lumos: Unified and Modular Training for Open-Source Language Agents	Nov 9, 2023	MathQuestion Answering	CodeCode Available	2	5
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models	Apr 7, 2025	Dialogue EvaluationFairness	CodeCode Available	2	5
Language Models are Multilingual Chain-of-Thought Reasoners	Oct 6, 2022	GSM8KMath	CodeCode Available	2	5
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning	Sep 11, 2023	MathMathematical Reasoning	CodeCode Available	2	5
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	Feb 19, 2024	Instruction FollowingMath	CodeCode Available	2	5
Evaluating Mathematical Reasoning Beyond Accuracy	Apr 8, 2024	MathMathematical Reasoning	CodeCode Available	2	5
Can AI Assistants Know What They Don't Know?	Jan 24, 2024	MathOpen-Domain Question Answering	CodeCode Available	2	5
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model	Oct 17, 2024	Math	CodeCode Available	2	5
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters	May 27, 2024	BenchmarkingGSM8K	CodeCode Available	2	5
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Sep 4, 2024	GSM8KMath	CodeCode Available	2	5
Essential-Web v1.0: 24T tokens of organized web data	Jun 17, 2025	Math	CodeCode Available	2	5
Dynamic Early Exit in Reasoning Models	Apr 22, 2025	GSM8KMath	CodeCode Available	2	5
Balancing LoRA Performance and Efficiency with Simple Shard Sharing	Sep 19, 2024	Computational EfficiencyGSM8K	CodeCode Available	2	5
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	Nov 24, 2024	MathMixture-of-Experts	CodeCode Available	2	5
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap	Feb 29, 2024	Math	CodeCode Available	2	5
Learning to Reason for Long-Form Story Generation	Mar 28, 2025	FormMath	CodeCode Available	2	5

Show:10 25 50

← PrevPage 9 of 64Next →

No leaderboard results yet.