SOTAVerified

GSM8K

Papers

Showing 7180 of 439 papers

TitleStatusHype
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement LearningCode2
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in ChineseCode2
Meta Prompting for AI SystemsCode2
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free LunchCode2
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math ReasoningCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
Scaling Relationship on Learning Mathematical Reasoning with Large Language ModelsCode2
Progressive-Hint Prompting Improves Reasoning in Large Language ModelsCode2
Language Models are Multilingual Chain-of-Thought ReasonersCode2
Show:102550
← PrevPage 8 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified