SOTAVerified

GSM8K

Papers

Showing 301350 of 439 papers

TitleStatusHype
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Iterative Reasoning Preference Optimization0
Markovian Transformers for Informative Language ModelingCode1
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?0
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models0
Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingCode1
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning0
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained RewardsCode2
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language ModelsCode3
Automatic Prompt Selection for Large Language Models0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with AutoformalizationCode1
Supervisory Prompt Training0
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-TuningCode9
LLM2LLM: Boosting LLMs with Novel Iterative Data EnhancementCode2
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt CompressionCode9
Self-Consistency Boosts Calibration for Math Reasoning0
Quiet-STaR: Language Models Can Teach Themselves to Think Before SpeakingCode4
Large Language Models are Contrastive ReasonersCode1
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
MathScale: Scaling Instruction Tuning for Mathematical ReasoningCode0
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language ModelsCode1
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem SolversCode2
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt TemplatesCode1
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and DistillationCode1
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Reformatted AlignmentCode2
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Language Models as Science TutorsCode1
Can Separators Improve Chain-of-Thought Prompting?0
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetCode4
Premise Order Matters in Reasoning with Large Language Models0
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements0
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
InternLM-Math: Open Math Large Language Models Toward Verifiable ReasoningCode4
In-Context Principle Learning from MistakesCode0
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement LearningCode2
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models0
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision0
YODA: Teacher-Student Progressive Learning for Language Models0
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in ChineseCode2
Show:102550
← PrevPage 7 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified