SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 476–500 of 1596 papers

Title	Date	Tasks	Status	Hype
Toward Adaptive Reasoning in Large Language Models with Thought Rollback	Jul 21, 2024	Arithmetic ReasoningMath	CodeCode Available	1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers	Jun 1, 2022	Math	CodeCode Available	1
Towards an AI to Win Ghana's National Science and Maths Quiz	Aug 8, 2023	MathQuestion Answering	CodeCode Available	1
Large Language Models Are Neurosymbolic Reasoners	Jan 17, 2024	Common Sense ReasoningMath	CodeCode Available	1
How to Get Your LLM to Generate Challenging Problems for Evaluation	Feb 20, 2025	Code CompletionMath	CodeCode Available	1
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation	Dec 28, 2023	GSM8KLanguage Model Evaluation	CodeCode Available	1
How well do Large Language Models perform in Arithmetic tasks?	Mar 16, 2023	Math	CodeCode Available	1
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning	May 30, 2025	MathMathematical Reasoning	CodeCode Available	1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics	Oct 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
HARP: A challenging human-annotated math reasoning benchmark	Dec 11, 2024	Math	CodeCode Available	1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems	Apr 23, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems	May 17, 2025	Arithmetic ReasoningCode Generation	CodeCode Available	1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for Education	Jan 30, 2023	MathPosition	CodeCode Available	1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models	Jun 4, 2025	Math	CodeCode Available	1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning	Jan 19, 2024	GSM8KMath	CodeCode Available	1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles	Jun 18, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	Jun 4, 2023	Math	CodeCode Available	1
Implicit Chain of Thought Reasoning via Knowledge Distillation	Nov 2, 2023	Knowledge DistillationMath	CodeCode Available	1
Are NLP Models really able to Solve Simple Math Word Problems?	Mar 12, 2021	MathMath Word Problem Solving	CodeCode Available	1
Case-Based or Rule-Based: How Do Transformers Do the Math?	Feb 27, 2024	MathSystematic Generalization	CodeCode Available	1
Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word Problem	Apr 7, 2020	DecoderMachine Translation	CodeCode Available	1
CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models	Dec 23, 2024	Decision MakingMath	CodeCode Available	1
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation	Apr 25, 2024	Code GenerationMath	CodeCode Available	1

Show:10 25 50

← PrevPage 20 of 64Next →

No leaderboard results yet.