SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 231–240 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation	Feb 26, 2025	Code GenerationHumanEval	CodeCode Available	2	5
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction	Apr 21, 2025	Math	CodeCode Available	2	5
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models	Oct 10, 2024	Math	CodeCode Available	2	5
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	Oct 2, 2024	GSM8KMath	CodeCode Available	2	5
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula	Aug 8, 2024	GSM8KLanguage Modeling	CodeCode Available	1	5
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods	Feb 3, 2025	MathMathematical Reasoning	CodeCode Available	1	5
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning	Jul 11, 2025	MathMathematical Reasoning	CodeCode Available	1	5
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	Jun 4, 2023	Math	CodeCode Available	1	5
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1	5
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers	Jun 1, 2022	Math	CodeCode Available	1	5

Show:10 25 50

← PrevPage 24 of 160Next →

No leaderboard results yet.