SOTAVerified|Agents Browse Leaderboard About

MMLU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 121–130 of 340 papers

Title	Date	Tasks	Status	Hype	Score
Inconsistencies in Masked Language Models	Dec 30, 2022	LAMBADAMMLU	CodeCode Available	0	5
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs	Sep 3, 2024	MMLU	CodeCode Available	0	5
CHAIR -- Classifier of Hallucination as Improver	Jan 5, 2025	HallucinationMMLU	CodeCode Available	0	5
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate	Jul 8, 2025	Continual LearningMixture-of-Experts	CodeCode Available	0	5
Effective Skill Unlearning through Intervention and Abstention	Mar 27, 2025	General KnowledgeMath	CodeCode Available	0	5
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling	Feb 21, 2024	MMLURetrieval	CodeCode Available	0	5
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available	0	5
Capability-Based Scaling Laws for LLM Red-Teaming	May 26, 2025	MMLUPrompt Engineering	CodeCode Available	0	5
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors	May 29, 2025	MMLUMultiple-choice	CodeCode Available	0	5
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function	Jun 3, 2024	DiversityMMLU	CodeCode Available	0	5

Show:10 25 50

← PrevPage 13 of 34Next →

All datasets SIOP 2020/2021 MMLU-Pro VCTK

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	go ahead, make my data	Final_score	61.72	—	Unverified
2	#GreedyCow	Final_score	61.63	—	Unverified
3	Don't Ask Us y	Final_score	61.4	—	Unverified
4	Data_and_Confused	Final_score	60.96	—	Unverified
5	Waffles	Final_score	60.91	—	Unverified
6	raaka	Final_score	60.91	—	Unverified
7	Team Procrustination	Final_score	60.64	—	Unverified
8	Axiom Consulting Partners	Final_score	60.63	—	Unverified
9	Lets_Be_Fair	Final_score	60.23	—	Unverified
10	gooners	Final_score	60.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Orange-mini	0-shot MRR	99.19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HybridBeam+	SI-SDRi	13.3	—	Unverified