SOTAVerified

GSM8K

Papers

Showing 1120 of 439 papers

TitleStatusHype
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-InstructCode5
SepLLM: Accelerate Large Language Models by Compressing One Segment into One SeparatorCode4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven InsightsCode4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
Quiet-STaR: Language Models Can Teach Themselves to Think Before SpeakingCode4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
ReFT: Reasoning with Reinforced Fine-TuningCode4
Baichuan 2: Open Large-scale Language ModelsCode4
Thinkless: LLM Learns When to ThinkCode3
Show:102550
← PrevPage 2 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified