SOTAVerified|Agents Browse Leaderboard About

MMLU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 321–330 of 340 papers

Title	Date	Tasks	Status	Hype	Score
Measuring Progress on Scalable Oversight for Large Language Models	Nov 4, 2022	Experimental DesignLanguage Modelling	—Unverified	0	0
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients	May 3, 2025	GSM8KMMLU	—Unverified	0	0
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs	Oct 15, 2024	GSM8KMath	—Unverified	0	0
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning	May 15, 2025	Continual PretrainingMMLU	—Unverified	0	0
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures	Jun 3, 2024	ChatbotMMLU	—Unverified	0	0
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design	Dec 19, 2024	MMLUQuantization	—Unverified	0	0
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference	Nov 27, 2024	GSM8KLanguage Modeling	—Unverified	0	0
Watson: A Cognitive Observability Framework for the Reasoning of LLM-Powered Agents	Nov 5, 2024	MMLU	—Unverified	0	0
An Assessment of Model-On-Model Deception	May 10, 2024	MMLUmodel	—Unverified	0	0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers	Jun 5, 2025	GSM8KMath	—Unverified	0	0

Show:10 25 50

← PrevPage 33 of 34Next →

All datasets SIOP 2020/2021 MMLU-Pro VCTK

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	go ahead, make my data	Final_score	61.72	—	Unverified
2	#GreedyCow	Final_score	61.63	—	Unverified
3	Don't Ask Us y	Final_score	61.4	—	Unverified
4	Data_and_Confused	Final_score	60.96	—	Unverified
5	Waffles	Final_score	60.91	—	Unverified
6	raaka	Final_score	60.91	—	Unverified
7	Team Procrustination	Final_score	60.64	—	Unverified
8	Axiom Consulting Partners	Final_score	60.63	—	Unverified
9	Lets_Be_Fair	Final_score	60.23	—	Unverified
10	gooners	Final_score	60.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Orange-mini	0-shot MRR	99.19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HybridBeam+	SI-SDRi	13.3	—	Unverified