SOTAVerified

GSM8K

Papers

Showing 251275 of 439 papers

TitleStatusHype
Iterative Reasoning Preference Optimization0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?0
Kwai-STaR: Transform LLMs into State-Transition Reasoners0
KwaiYiiMath: Technical Report0
Large Language Models as Analogical Reasoners0
Large Language Models Can Self-Improve0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models0
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint0
Let's Reinforce Step by Step0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
Leveraging Uncertainty Estimation for Efficient LLM Routing0
LiteSearch: Efficacious Tree Search for LLM0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications0
DavIR: Data Selection via Implicit Reward for Large Language Models0
Local Prompt Optimization0
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems0
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing0
MALT: Improving Reasoning with Multi-Agent LLM Training0
Show:102550
← PrevPage 11 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified