SOTAVerified

GSM8K

Papers

Showing 326350 of 439 papers

TitleStatusHype
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language ModelsCode1
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem SolversCode2
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt TemplatesCode1
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and DistillationCode1
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Reformatted AlignmentCode2
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Language Models as Science TutorsCode1
Can Separators Improve Chain-of-Thought Prompting?0
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
Premise Order Matters in Reasoning with Large Language Models0
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements0
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
In-Context Principle Learning from MistakesCode0
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement LearningCode2
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models0
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision0
YODA: Teacher-Student Progressive Learning for Language Models0
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in ChineseCode2
Show:102550
← PrevPage 14 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified