SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 81–90 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models	Aug 3, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2	5
Language Models as Science Tutors	Feb 16, 2024	GSM8KMath	CodeCode Available	1	5
Large Language Models are Contrastive Reasoners	Mar 13, 2024	GSM8K	CodeCode Available	1	5
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1	5
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning	Jan 27, 2023	Few-Shot LearningGSM8K	CodeCode Available	1	5
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates	Feb 28, 2024	GSM8KSafety Alignment	CodeCode Available	1	5
Automatic Model Selection with Large Language Models for Reasoning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
Matrix Information Theory for Self-Supervised Learning	May 27, 2023	Contrastive LearningGSM8K	CodeCode Available	1	5
CommVQ: Commutative Vector Quantization for KV Cache Compression	Jun 23, 2025	GPUGSM8K	CodeCode Available	1	5
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning	Oct 8, 2024	GSM8KMulti-agent Reinforcement Learning	CodeCode Available	1	5

Show:10 25 50

← PrevPage 9 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified