SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
Collective Constitutional AI: Aligning a Language Model with Public Input	Jun 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
A Categorical Archive of ChatGPT Failures	Feb 6, 2023	Math	CodeCode Available	1	5
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis	May 1, 2025	GSM8KMath	CodeCode Available	1	5
NLPBench: Evaluating Large Language Models on Solving NLP Problems	Sep 27, 2023	BenchmarkingMath	CodeCode Available	1	5
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data	Feb 14, 2024	Automated Theorem ProvingLanguage Modelling	CodeCode Available	1	5
A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for Hand	Apr 25, 2020	MathRelation	CodeCode Available	1	5
MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving	Jul 28, 2021	Common Sense ReasoningLanguage Modeling	CodeCode Available	1	5
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1	5
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective	Jun 22, 2025	In-Context LearningLarge Language Model	CodeCode Available	1	5
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula	Aug 8, 2024	GSM8KLanguage Modeling	CodeCode Available	1	5
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1	5
MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers	Sep 2, 2021	MathMath Word Problem Solving	CodeCode Available	1	5
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent	Dec 14, 2023	Language ModelingLanguage Modelling	CodeCode Available	1	5
DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents	Nov 16, 2023	Math	CodeCode Available	1	5
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning	Jan 19, 2024	GSM8KMath	CodeCode Available	1	5
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1	5
EXAONE Deep: Reasoning Enhanced Language Models	Mar 16, 2025	Math	CodeCode Available	1	5
Entropy-Regularized Process Reward Model	Dec 15, 2024	GSM8KMath	CodeCode Available	1	5
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1	5
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning	Aug 10, 2022	MathMathematical Reasoning	CodeCode Available	1	5
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	Jun 4, 2023	Math	CodeCode Available	1	5
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges	May 21, 2025	Mathvalid	CodeCode Available	1	5
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models	Aug 30, 2024	Image CaptioningLanguage Modeling	CodeCode Available	1	5
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning	Sep 19, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	1	5
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs	Jan 11, 2025	MathMathematical Problem-Solving	CodeCode Available	1	5

Show:10 25 50

← PrevPage 17 of 64Next →

No leaderboard results yet.