SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset	Nov 9, 2023	MathNatural Language Understanding	CodeCode Available	1	5
EXAONE Deep: Reasoning Enhanced Language Models	Mar 16, 2025	Math	CodeCode Available	1	5
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective	Jun 22, 2025	In-Context LearningLarge Language Model	CodeCode Available	1	5
Explaining Datasets in Words: Statistical Models with Natural Language Parameters	Sep 13, 2024	ClusteringLanguage Modeling	CodeCode Available	1	5
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models	Oct 21, 2022	MathMathematical Reasoning	CodeCode Available	1	5
Expression Syntax Information Bottleneck for Math Word Problems	Oct 24, 2023	Math	CodeCode Available	1	5
GOLD: Geometry Problem Solver with Natural Language Description	May 1, 2024	Math	CodeCode Available	1	5
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1	5
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis	May 1, 2025	GSM8KMath	CodeCode Available	1	5
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts	Oct 23, 2023	Logical ReasoningMath	CodeCode Available	1	5
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	Jun 4, 2023	Math	CodeCode Available	1	5
NLPBench: Evaluating Large Language Models on Solving NLP Problems	Sep 27, 2023	BenchmarkingMath	CodeCode Available	1	5
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models	Feb 22, 2024	MathMathematical Reasoning	CodeCode Available	1	5
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1	5
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices	Mar 19, 2024	Math	CodeCode Available	1	5
HARP: A challenging human-annotated math reasoning benchmark	Dec 11, 2024	Math	CodeCode Available	1	5
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning	Jan 19, 2024	GSM8KMath	CodeCode Available	1	5
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports	May 16, 2025	DiagnosticMath	CodeCode Available	1	5
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula	Aug 8, 2024	GSM8KLanguage Modeling	CodeCode Available	1	5
A Symbolic Character-Aware Model for Solving Geometry Problems	Aug 5, 2023	MathMulti-Label Classification	CodeCode Available	1	5
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models	Apr 8, 2025	MathMultimodal Reasoning	CodeCode Available	1	5
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1	5
Entropy-Regularized Process Reward Model	Dec 15, 2024	GSM8KMath	CodeCode Available	1	5
Math Word Problem Solving with Explicit Numerical Values	Aug 1, 2021	MathMath Word Problem Solving	CodeCode Available	1	5
Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions	Jun 7, 2021	MathQuestion Answering	CodeCode Available	1	5

Show:10 25 50

← PrevPage 15 of 64Next →

No leaderboard results yet.