SOTAVerified|Agents Browse Leaderboard About

MMLU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 131–140 of 340 papers

Title	Date	Tasks	Status	Hype	Score
Inconsistencies in Masked Language Models	Dec 30, 2022	LAMBADAMMLU	CodeCode Available	0	5
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function	Jun 3, 2024	DiversityMMLU	CodeCode Available	0	5
Review-Instruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models	May 16, 2025	DiversityMMLU	CodeCode Available	0	5
OpenGrok: Enhancing SNS Data Processing with Distilled Knowledge and Mask-like Mechanisms	Feb 11, 2025	Knowledge DistillationMMLU	CodeCode Available	0	5
Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs	May 31, 2025	MMLU	CodeCode Available	0	5
BnMMLU: Measuring Massive Multitask Language Understanding in Bengali	May 25, 2025	General KnowledgeMMLU	CodeCode Available	0	5
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs	Sep 3, 2024	MMLU	CodeCode Available	0	5
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning	Oct 14, 2024	In-Context LearningMMLU	CodeCode Available	0	5
DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance	Jan 29, 2025	DiversityMMLU	CodeCode Available	0	5
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning	Feb 27, 2024	8kLanguage Modeling	CodeCode Available	0	5

Show:10 25 50

← PrevPage 14 of 34Next →

All datasets SIOP 2020/2021 MMLU-Pro VCTK

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	go ahead, make my data	Final_score	61.72	—	Unverified
2	#GreedyCow	Final_score	61.63	—	Unverified
3	Don't Ask Us y	Final_score	61.4	—	Unverified
4	Data_and_Confused	Final_score	60.96	—	Unverified
5	Waffles	Final_score	60.91	—	Unverified
6	raaka	Final_score	60.91	—	Unverified
7	Team Procrustination	Final_score	60.64	—	Unverified
8	Axiom Consulting Partners	Final_score	60.63	—	Unverified
9	Lets_Be_Fair	Final_score	60.23	—	Unverified
10	gooners	Final_score	60.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Orange-mini	0-shot MRR	99.19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HybridBeam+	SI-SDRi	13.3	—	Unverified