SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning	Jul 4, 2024	AvgGSM8K	CodeCode Available	1	5
OJBench: A Competition Level Code Benchmark For Large Language Models	Jun 19, 2025	Math	CodeCode Available	1	5
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1	5
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Feb 27, 2025	GSM8KMath	CodeCode Available	1	5
FormulaNet: A Benchmark Dataset for Mathematical Formula Detection	Aug 29, 2022	Math	CodeCode Available	1	5
Expression Syntax Information Bottleneck for Math Word Problems	Oct 24, 2023	Math	CodeCode Available	1	5
Explaining Datasets in Words: Statistical Models with Natural Language Parameters	Sep 13, 2024	ClusteringLanguage Modeling	CodeCode Available	1	5
EXAONE Deep: Reasoning Enhanced Language Models	Mar 16, 2025	Math	CodeCode Available	1	5
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective	Jun 22, 2025	In-Context LearningLarge Language Model	CodeCode Available	1	5
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning	Aug 10, 2022	MathMathematical Reasoning	CodeCode Available	1	5
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning	Sep 29, 2022	Logical ReasoningMath	CodeCode Available	1	5
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing	Apr 2, 2025	3D ReconstructionBenchmarking	CodeCode Available	1	5
Non-Autoregressive Math Word Problem Solver with Unified Tree Structure	May 8, 2023	Mathvalid	CodeCode Available	1	5
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis	May 1, 2025	GSM8KMath	CodeCode Available	1	5
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula	Aug 8, 2024	GSM8KLanguage Modeling	CodeCode Available	1	5
Nerva: a Truly Sparse Implementation of Neural Networks	Jul 24, 2024	Math	CodeCode Available	1	5
Aioli: A Unified Optimization Framework for Language Model Data Mixing	Nov 8, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning	Sep 19, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	1	5
Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks	Jul 3, 2021	DecoderMath	CodeCode Available	1	5
CityGPT: Empowering Urban Spatial Cognition of Large Language Models	Jun 20, 2024	Code GenerationMath	CodeCode Available	1	5
Mathematical Capabilities of ChatGPT	Jan 31, 2023	Elementary MathematicsMath	CodeCode Available	1	5
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents	Aug 2, 2024	Code GenerationLarge Language Model	CodeCode Available	1	5
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1	5
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	Jun 4, 2023	Math	CodeCode Available	1	5
MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers	Sep 2, 2021	MathMath Word Problem Solving	CodeCode Available	1	5

Show:10 25 50

← PrevPage 18 of 64Next →

No leaderboard results yet.