SOTAVerified

GSM8K

Papers

Showing 176200 of 439 papers

TitleStatusHype
SEGO: Sequential Subgoal Optimization for Mathematical Problem-SolvingCode0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented GenerationCode0
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Can LLMs Reason in the Wild with Programs?Code0
Scaling Speculative Decoding with Lookahead ReasoningCode0
DIVE: Diversified Iterative Self-ImprovementCode0
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem SolvingCode0
Distilling Reasoning Capabilities into Smaller Language ModelsCode0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
Discriminative Policy Optimization for Token-Level Reward ModelsCode0
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy TheoryCode0
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuningCode0
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic SystemsCode0
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model EvaluationCode0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning AbilityCode0
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-SolvingCode0
In-Context Principle Learning from MistakesCode0
A mixed policy to improve performance of language models on math problemsCode0
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning PerspectiveCode0
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt CompressionCode0
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language ModelsCode0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical RangesCode0
Show:102550
← PrevPage 8 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified