SOTAVerified

GSM8K

Papers

Showing 5160 of 439 papers

TitleStatusHype
Preference Optimization for Reasoning with Pseudo FeedbackCode2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningCode2
Meta Prompting for AI SystemsCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
ProcessBench: Identifying Process Errors in Mathematical ReasoningCode2
Progressive-Hint Prompting Improves Reasoning in Large Language ModelsCode2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
Show:102550
← PrevPage 6 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified