SOTAVerified

GSM8K

Papers

Showing 371380 of 439 papers

TitleStatusHype
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks0
Let's Reinforce Step by Step0
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language ModelsCode1
Prompt Engineering a Prompt Engineer0
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free LunchCode2
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and ObservationsCode1
Learning From Mistakes Makes LLM Better ReasonerCode1
SkyMath: Technical ReportCode3
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
Show:102550
← PrevPage 38 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified