SOTAVerified|Agents Browse Leaderboard About

MMLU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 181–190 of 340 papers

Title	Date	Tasks	Status	Hype
Detecting Benchmark Contamination Through Watermarking	Feb 24, 2025	ARCMMLU	—Unverified	0
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks	Feb 24, 2025	2kARC	—Unverified	0
Distributional Scaling Laws for Emergent Capabilities	Feb 24, 2025	MMLU	—Unverified	0
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks	Feb 24, 2025	Mixture-of-ExpertsMMLU	—Unverified	0
Swallowing the Poison Pills: Insights from Vulnerability Disparity Among LLMs	Feb 23, 2025	Data PoisoningDiagnostic	—Unverified	0
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests	Feb 20, 2025	Logical ReasoningMMLU	—Unverified	0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models	Feb 20, 2025	HellaSwagMemorization	—Unverified	0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available	0
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks	Feb 18, 2025	MathMemorization	—Unverified	0
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception	Feb 17, 2025	MMLUNatural Questions	—Unverified	0

Show:10 25 50

← PrevPage 19 of 34Next →

All datasets SIOP 2020/2021 MMLU-Pro VCTK

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	go ahead, make my data	Final_score	61.72	—	Unverified
2	#GreedyCow	Final_score	61.63	—	Unverified
3	Don't Ask Us y	Final_score	61.4	—	Unverified
4	Data_and_Confused	Final_score	60.96	—	Unverified
5	raaka	Final_score	60.91	—	Unverified
6	Waffles	Final_score	60.91	—	Unverified
7	Team Procrustination	Final_score	60.64	—	Unverified
8	Axiom Consulting Partners	Final_score	60.63	—	Unverified
9	Lets_Be_Fair	Final_score	60.23	—	Unverified
10	gooners	Final_score	60.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Orange-mini	0-shot MRR	99.19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HybridBeam+	SI-SDRi	13.3	—	Unverified