SOTAVerified

GSM8K

Papers

Showing 5160 of 439 papers

TitleStatusHype
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
Dynamic Early Exit in Reasoning ModelsCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
How to Correctly do Semantic Backpropagation on Language-based Agentic SystemsCode2
Language Models are Multilingual Chain-of-Thought ReasonersCode2
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free LunchCode2
ProcessBench: Identifying Process Errors in Mathematical ReasoningCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
Meta Prompting for AI SystemsCode2
Show:102550
← PrevPage 6 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified