SOTAVerified

GSM8K

Papers

Showing 376400 of 439 papers

TitleStatusHype
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Supervisory Prompt Training0
Self-Consistency Boosts Calibration for Math Reasoning0
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
MathScale: Scaling Instruction Tuning for Mathematical ReasoningCode0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Can Separators Improve Chain-of-Thought Prompting?0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Premise Order Matters in Reasoning with Large Language Models0
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
In-Context Principle Learning from MistakesCode0
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models0
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision0
YODA: Teacher-Student Progressive Learning for Language Models0
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting0
TinyGSM: achieving >80% on GSM8k with small language models0
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning0
Training Chain-of-Thought via Latent-Variable Inference0
Show:102550
← PrevPage 16 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified