SOTAVerified

GSM8K

Papers

Showing 401439 of 439 papers

TitleStatusHype
Automatic Prompt Selection for Large Language Models0
AutoJudge: Judge Decoding Without Manual Annotation0
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
Towards Multilingual LLM Evaluation for European Languages0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions0
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning0
Arithmetic Reasoning with LLM: Prolog Generation & Permutation0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
Training Chain-of-Thought via Latent-Variable Inference0
Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning0
A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models0
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint0
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment0
Let's Reinforce Step by Step0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
Leveraging Uncertainty Estimation for Efficient LLM Routing0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
Large Language Models Can Self-Improve0
LiteSearch: Efficacious Tree Search for LLM0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
Training Large Language Models to Reason via EM Policy Gradient0
Large Language Models as Analogical Reasoners0
KwaiYiiMath: Technical Report0
Kwai-STaR: Transform LLMs into State-Transition Reasoners0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications0
AgentInstruct: Toward Generative Teaching with Agentic Flows0
DavIR: Data Selection via Implicit Reward for Large Language Models0
Local Prompt Optimization0
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems0
Transcending Scaling Laws with 0.1% Extra Compute0
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing0
Show:102550
← PrevPage 9 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified