SOTAVerified

GSM8K

Papers

Showing 401425 of 439 papers

TitleStatusHype
Automatic Prompt Selection for Large Language Models0
AutoJudge: Judge Decoding Without Manual Annotation0
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
Towards Multilingual LLM Evaluation for European Languages0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions0
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning0
Arithmetic Reasoning with LLM: Prolog Generation & Permutation0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
Training Chain-of-Thought via Latent-Variable Inference0
Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning0
A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models0
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint0
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment0
Let's Reinforce Step by Step0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
Leveraging Uncertainty Estimation for Efficient LLM Routing0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
Large Language Models Can Self-Improve0
LiteSearch: Efficacious Tree Search for LLM0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
Show:102550
← PrevPage 17 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified