SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–439 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Training Large Language Models to Reason via EM Policy Gradient	Apr 24, 2025	GSM8KMath	—Unverified	0	0
Large Language Models as Analogical Reasoners	Oct 3, 2023	Code GenerationGSM8K	—Unverified	0	0
KwaiYiiMath: Technical Report	Oct 11, 2023	Arithmetic ReasoningGSM8K	—Unverified	0	0
Kwai-STaR: Transform LLMs into State-Transition Reasoners	Nov 7, 2024	GSM8KMathematical Problem-Solving	—Unverified	0	0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications	May 14, 2024	GSM8KMath	—Unverified	0	0
AgentInstruct: Toward Generative Teaching with Agentic Flows	Jul 3, 2024	GSM8KMMLU	—Unverified	0	0
DavIR: Data Selection via Implicit Reward for Large Language Models	Oct 16, 2023	Causal Language ModelingGSM8K	—Unverified	0	0
Local Prompt Optimization	Apr 29, 2025	GSM8KMath	—Unverified	0	0
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems	Aug 29, 2024	GSM8KLanguage Modeling	—Unverified	0	0
Transcending Scaling Laws with 0.1% Extra Compute	Oct 20, 2022	Arithmetic ReasoningCross-Lingual Question Answering	—Unverified	0	0
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models	Feb 24, 2024	GSM8KMathematical Reasoning	—Unverified	0	0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?	Jul 15, 2025	GSM8KLanguage Modeling	—Unverified	0	0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning	Mar 4, 2024	GSM8KMath	—Unverified	0	0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing	Jun 17, 2025	ARCCoLA	—Unverified	0	0

Show:10 25 50

← PrevPage 18 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified