SOTAVerified|Agents Browse Leaderboard About Blog

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 439 papers

Title	Date	Tasks	Status	Hype	Score
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14	5
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13	5
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning	Mar 26, 2024	GPUGSM8K	CodeCode Available	9	5
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression	Mar 19, 2024	GSM8KLanguage Modelling	CodeCode Available	9	5
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	May 23, 2024	GSM8KMixture-of-Experts	CodeCode Available	7	5
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7	5
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	Jan 28, 2022	Common Sense ReasoningGSM8K	CodeCode Available	6	5
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models	Oct 9, 2023	GSM8KIn-Context Learning	CodeCode Available	5	5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B	Jun 11, 2024	Decision MakingGSM8K	CodeCode Available	5	5
Common 7B Language Models Already Possess Strong Math Capabilities	Mar 7, 2024	GSM8KMath	CodeCode Available	5	5

Show:10 25 50

← PrevPage 1 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified