SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 341–350 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings	May 30, 2025	Math	CodeCode Available	1	5
Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset	Nov 9, 2023	MathNatural Language Understanding	CodeCode Available	1	5
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations	Dec 14, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	Jun 4, 2023	Math	CodeCode Available	1	5
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1	5
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1	5
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models	Oct 21, 2022	MathMathematical Reasoning	CodeCode Available	1	5
MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents	Jul 24, 2024	Math	CodeCode Available	1	5
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data	Feb 14, 2024	Automated Theorem ProvingLanguage Modelling	CodeCode Available	1	5
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1	5

Show:10 25 50

← PrevPage 35 of 160Next →

No leaderboard results yet.