SOTAVerified

GSM8K

Papers

Showing 176200 of 439 papers

TitleStatusHype
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning0
Evaluation of LLMs for mathematical problem solving0
Complexity-Based Prompting for Multi-Step Reasoning0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
AutoJudge: Judge Decoding Without Manual Annotation0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation0
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
Efficient Data Selection at Scale via Influence Distillation0
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
Dynamic Parallel Tree Search for Efficient LLM Reasoning0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LiteSearch: Efficacious Tree Search for LLM0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
Leveraging Uncertainty Estimation for Efficient LLM Routing0
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
MathDivide: Improved mathematical reasoning by large language models0
Show:102550
← PrevPage 8 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified