SOTAVerified

GSM8K

Papers

Showing 401425 of 439 papers

TitleStatusHype
AlignedCoT: Prompting Large Language Models via Native-Speaking DemonstrationsCode0
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning0
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks0
The ART of LLM Refinement: Ask, Refine, and Trust0
Let's Reinforce Step by Step0
Prompt Engineering a Prompt Engineer0
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
DavIR: Data Selection via Implicit Reward for Large Language Models0
KwaiYiiMath: Technical Report0
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference0
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
Think before you speak: Training Language Models With Pause Tokens0
Large Language Models as Analogical Reasoners0
Adapting LLM Agents with Universal Feedback in Communication0
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities0
Contrastive Decoding Improves Reasoning in Large Language Models0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Exploring an LM to generate Prolog Predicates from Mathematics Questions0
MathAttack: Attacking Large Language Models Towards Math Solving Ability0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function0
Exploring Equation as a Better Intermediate Meaning Representation for Numerical ReasoningCode0
A mixed policy to improve performance of language models on math problemsCode0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models0
Show:102550
← PrevPage 17 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified