SOTAVerified

GSM8K

Papers

Showing 201225 of 439 papers

TitleStatusHype
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsCode2
Dialectical Behavior Therapy Approach to LLM Prompting0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing0
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches0
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement LearningCode1
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths0
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsCode1
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification0
LLM-TOPLA: Efficient LLM Ensemble by Maximising DiversityCode0
BrainTransformers: SNN-LLM0
Unlocking Structured Thinking in Language Models with Cognitive Prompting0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning0
The Role of Deductive and Inductive Reasoning in Large Language Models0
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation0
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit AssignmentCode2
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-ProblemsCode0
Instance-adaptive Zero-shot Chain-of-Thought Prompting0
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
Uncovering Latent Chain of Thought Vectors in Language Models0
Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning TasksCode1
ControlMath: Controllable Data Generation Promotes Math Generalist Models0
Show:102550
← PrevPage 9 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified