SOTAVerified

GSM8K

Papers

Showing 301325 of 439 papers

TitleStatusHype
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Iterative Reasoning Preference Optimization0
Markovian Transformers for Informative Language ModelingCode1
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?0
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models0
Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingCode1
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning0
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained RewardsCode2
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language ModelsCode3
Automatic Prompt Selection for Large Language Models0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with AutoformalizationCode1
Supervisory Prompt Training0
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-TuningCode9
LLM2LLM: Boosting LLMs with Novel Iterative Data EnhancementCode2
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt CompressionCode9
Self-Consistency Boosts Calibration for Math Reasoning0
Quiet-STaR: Language Models Can Teach Themselves to Think Before SpeakingCode4
Large Language Models are Contrastive ReasonersCode1
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
Common 7B Language Models Already Possess Strong Math CapabilitiesCode5
MathScale: Scaling Instruction Tuning for Mathematical ReasoningCode0
Show:102550
← PrevPage 13 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified