SOTAVerified

GSM8K

Papers

Showing 151200 of 439 papers

TitleStatusHype
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA OptimizationCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
Design of Chain-of-Thought in Math Problem SolvingCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning ProofsCode1
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsCode1
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty0
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree0
Cool-Fusion: Fuse Large Language Models without Training0
Automatic Prompt Selection for Large Language Models0
ControlMath: Controllable Data Generation Promotes Math Generalist Models0
Maximizing Confidence Alone Improves Reasoning0
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients0
Exploring an LM to generate Prolog Predicates from Mathematics Questions0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
Contrastive Decoding Improves Reasoning in Large Language Models0
Excessive Reasoning Attack on Reasoning LLMs0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost0
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning0
Evaluation of LLMs for mathematical problem solving0
Complexity-Based Prompting for Multi-Step Reasoning0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
AutoJudge: Judge Decoding Without Manual Annotation0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation0
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
Efficient Data Selection at Scale via Influence Distillation0
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
Dynamic Parallel Tree Search for Efficient LLM Reasoning0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LiteSearch: Efficacious Tree Search for LLM0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
Leveraging Uncertainty Estimation for Efficient LLM Routing0
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
MathDivide: Improved mathematical reasoning by large language models0
Show:102550
← PrevPage 4 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified