SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 311–320 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models	Apr 18, 2024	GSM8KMMLU	—Unverified	0	0
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?	Apr 19, 2024	GSM8K	—Unverified	0	0
Reliable Reasoning Beyond Natural Language	Jul 16, 2024	GSM8KMathematical Reasoning	—Unverified	0	0
Rethinking Data Synthesis: A Teacher Model Training Recipe with Interpretation	Oct 27, 2024	GSM8KLanguage Modeling	—Unverified	0	0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models	May 20, 2025	GSM8KMathematical Reasoning	—Unverified	0	0
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models	Feb 6, 2024	GSM8KMath	—Unverified	0	0
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs	Dec 30, 2024	GSM8K	—Unverified	0	0
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs	May 19, 2025	GSM8K	—Unverified	0	0
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations	Jan 25, 2025	Computational EfficiencyGSM8K	—Unverified	0	0
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models	Mar 14, 2025	Checkmate In OneGSM8K	—Unverified	0	0

Show:10 25 50

← PrevPage 32 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified