SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 421–430 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Feb 27, 2025	GSM8KHumanEval	—Unverified	0	0
Large Language Models Can Self-Improve	Oct 20, 2022	Arithmetic ReasoningCommon Sense Reasoning	—Unverified	0	0
LiteSearch: Efficacious Tree Search for LLM	Jun 29, 2024	GSM8KMathematical Reasoning	—Unverified	0	0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	May 25, 2025	GSM8KHumanEval	—Unverified	0	0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ	Sep 25, 2024	ChatbotGSM8K	—Unverified	0	0
Training Large Language Models to Reason via EM Policy Gradient	Apr 24, 2025	GSM8KMath	—Unverified	0	0
Large Language Models as Analogical Reasoners	Oct 3, 2023	Code GenerationGSM8K	—Unverified	0	0
KwaiYiiMath: Technical Report	Oct 11, 2023	Arithmetic ReasoningGSM8K	—Unverified	0	0
Kwai-STaR: Transform LLMs into State-Transition Reasoners	Nov 7, 2024	GSM8KMathematical Problem-Solving	—Unverified	0	0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications	May 14, 2024	GSM8KMath	—Unverified	0	0

Show:10 25 50

← PrevPage 43 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified