SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 191–200 of 439 papers

Title	Date	Tasks	Status	Hype
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection	May 12, 2025	GSM8KHumanEval	—Unverified	0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ	Sep 25, 2024	ChatbotGSM8K	—Unverified	0
Dynamic Parallel Tree Search for Efficient LLM Reasoning	Feb 22, 2025	Computational EfficiencyGSM8K	—Unverified	0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	May 25, 2025	GSM8KHumanEval	—Unverified	0
LiteSearch: Efficacious Tree Search for LLM	Jun 29, 2024	GSM8KMathematical Reasoning	—Unverified	0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities	Dec 22, 2023	ChatbotGSM8K	—Unverified	0
Leveraging Uncertainty Estimation for Efficient LLM Routing	Feb 16, 2025	GSM8KMMLU	—Unverified	0
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models	Nov 13, 2024	GSM8K	—Unverified	0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning	Oct 16, 2023	Code GenerationGSM8K	—Unverified	0
MathDivide: Improved mathematical reasoning by large language models	May 12, 2024	GSM8KLogical Reasoning	—Unverified	0

Show:10 25 50

← PrevPage 20 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified