SOTAVerified

Math

Papers

Showing 391400 of 1596 papers

TitleStatusHype
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language ModelsCode1
Language Models as Science TutorsCode1
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-SolvingCode1
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataCode1
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths AggregationCode1
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language ModelsCode1
ReGAL: Refactoring Programs to Discover Generalizable AbstractionsCode1
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic TasksCode1
Over-Reasoning and Redundant Calculation of Large Language ModelsCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
Show:102550
← PrevPage 40 of 160Next →

No leaderboard results yet.