SOTAVerified

GSM8K

Papers

Showing 401439 of 439 papers

TitleStatusHype
AlignedCoT: Prompting Large Language Models via Native-Speaking DemonstrationsCode0
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning0
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks0
The ART of LLM Refinement: Ask, Refine, and Trust0
Let's Reinforce Step by Step0
Prompt Engineering a Prompt Engineer0
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
DavIR: Data Selection via Implicit Reward for Large Language Models0
KwaiYiiMath: Technical Report0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
Think before you speak: Training Language Models With Pause Tokens0
Large Language Models as Analogical Reasoners0
Adapting LLM Agents with Universal Feedback in Communication0
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities0
Contrastive Decoding Improves Reasoning in Large Language Models0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Exploring an LM to generate Prolog Predicates from Mathematics Questions0
MathAttack: Attacking Large Language Models Towards Math Solving Ability0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function0
Exploring Equation as a Better Intermediate Meaning Representation for Numerical ReasoningCode0
A mixed policy to improve performance of language models on math problemsCode0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models0
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning0
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic SystemsCode0
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuningCode0
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
Self-Evaluation Guided Beam Search for Reasoning0
Teaching Small Language Models to Reason0
Distilling Reasoning Capabilities into Smaller Language ModelsCode0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
Solving math word problems with process- and outcome-based feedback0
Large Language Models Can Self-Improve0
Transcending Scaling Laws with 0.1% Extra Compute0
Complexity-Based Prompting for Multi-Step Reasoning0
Making Large Language Models Better Reasoners with Step-Aware Verifier0
Show:102550
← PrevPage 9 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified