SOTAVerified

GSM8K

Papers

Showing 1120 of 439 papers

TitleStatusHype
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
SepLLM: Accelerate Large Language Models by Compressing One Segment into One SeparatorCode4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven InsightsCode4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversCode4
Baichuan 2: Open Large-scale Language ModelsCode4
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
Quiet-STaR: Language Models Can Teach Themselves to Think Before SpeakingCode4
ReFT: Reasoning with Reinforced Fine-TuningCode4
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
Show:102550
← PrevPage 2 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified