SOTAVerified|Agents Browse Leaderboard About

MMLU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 340 papers

Title	Date	Tasks	Status	Hype	Score
Reinforcing General Reasoning without Verifiers	May 27, 2025	MathMathematical Reasoning	CodeCode Available	2	5
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention	Feb 8, 2024	MMLUQuantization	CodeCode Available	2	5
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples	Nov 8, 2023	HumanEvalMMLU	CodeCode Available	2	5
A StrongREJECT for Empty Jailbreaks	Feb 15, 2024	MMLU	CodeCode Available	2	5
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs	Apr 21, 2024	MMLURed Teaming	CodeCode Available	2	5
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark	Dec 19, 2024	MMLUMultiple-choice	CodeCode Available	2	5
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning	Nov 16, 2023	MedQAMMLU	CodeCode Available	2	5
Inheritune: Training Smaller Yet More Attentive Language Models	Apr 12, 2024	DecoderLanguage Modelling	CodeCode Available	2	5
Routoo: Learning to Route to Large Language Models Effectively	Jan 25, 2024	MMLUMulti-task Language Understanding	CodeCode Available	2	5
any4: Learned 4-bit Numeric Representation for LLMs	Jul 7, 2025	GPUGSM8K	CodeCode Available	2	5

Show:10 25 50

← PrevPage 4 of 34Next →

All datasets SIOP 2020/2021 MMLU-Pro VCTK

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	go ahead, make my data	Final_score	61.72	—	Unverified
2	#GreedyCow	Final_score	61.63	—	Unverified
3	Don't Ask Us y	Final_score	61.4	—	Unverified
4	Data_and_Confused	Final_score	60.96	—	Unverified
5	raaka	Final_score	60.91	—	Unverified
6	Waffles	Final_score	60.91	—	Unverified
7	Team Procrustination	Final_score	60.64	—	Unverified
8	Axiom Consulting Partners	Final_score	60.63	—	Unverified
9	Lets_Be_Fair	Final_score	60.23	—	Unverified
10	gooners	Final_score	60.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Orange-mini	0-shot MRR	99.19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HybridBeam+	SI-SDRi	13.3	—	Unverified