SOTAVerified

GSM8K

Papers

Showing 351400 of 439 papers

TitleStatusHype
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
PORT: Preference Optimization on Reasoning Traces0
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model EvaluationCode0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Can LLMs Reason in the Wild with Programs?Code0
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank AdaptationCode0
Uncertainty Aware Learning for Language Model Alignment0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
Improve Mathematical Reasoning in Language Models by Automated Process Supervision0
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM DeploymentCode0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Arithmetic Reasoning with LLM: Prolog Generation & Permutation0
Multi-Reference Preference Optimization for Large Language Models0
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time0
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications0
MathDivide: Improved mathematical reasoning by large language models0
MAmmoTH2: Scaling Instructions from the Web0
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Iterative Reasoning Preference Optimization0
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?0
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models0
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning0
Automatic Prompt Selection for Large Language Models0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Supervisory Prompt Training0
Self-Consistency Boosts Calibration for Math Reasoning0
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
MathScale: Scaling Instruction Tuning for Mathematical ReasoningCode0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Can Separators Improve Chain-of-Thought Prompting?0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Premise Order Matters in Reasoning with Large Language Models0
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
In-Context Principle Learning from MistakesCode0
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models0
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision0
YODA: Teacher-Student Progressive Learning for Language Models0
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting0
TinyGSM: achieving >80% on GSM8k with small language models0
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning0
Training Chain-of-Thought via Latent-Variable Inference0
Show:102550
← PrevPage 8 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified