SOTAVerified|Agents Browse Leaderboard About

MMLU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 231–240 of 340 papers

Title	Date	Tasks	Status	Hype
Towards Multilingual LLM Evaluation for European Languages	Oct 11, 2024	ARCGSM8K	—Unverified	0
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception	Feb 17, 2025	MMLUNatural Questions	—Unverified	0
Towards Uncertainty-Aware Language Agent	Jan 25, 2024	MMLUStrategyQA	—Unverified	0
Transcending Scaling Laws with 0.1% Extra Compute	Oct 20, 2022	Arithmetic ReasoningCross-Lingual Question Answering	—Unverified	0
Transferable text data distillation by trajectory matching	Apr 14, 2025	ARCLarge Language Model	—Unverified	0
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests	Feb 20, 2025	Logical ReasoningMMLU	—Unverified	0
Understanding Finetuning for Factual Knowledge Extraction	Jun 20, 2024	MMLUQuestion Answering	—Unverified	0
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size	Mar 6, 2025	MMLUQuantization	—Unverified	0
Unraveling Indirect In-Context Learning Using Influence Functions	Jan 1, 2025	In-Context LearningInformativeness	—Unverified	0
Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach	Mar 13, 2025	Formal LogicMathematical Reasoning	—Unverified	0

Show:10 25 50

← PrevPage 24 of 34Next →

All datasets SIOP 2020/2021 MMLU-Pro VCTK

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	go ahead, make my data	Final_score	61.72	—	Unverified
2	#GreedyCow	Final_score	61.63	—	Unverified
3	Don't Ask Us y	Final_score	61.4	—	Unverified
4	Data_and_Confused	Final_score	60.96	—	Unverified
5	Waffles	Final_score	60.91	—	Unverified
6	raaka	Final_score	60.91	—	Unverified
7	Team Procrustination	Final_score	60.64	—	Unverified
8	Axiom Consulting Partners	Final_score	60.63	—	Unverified
9	Lets_Be_Fair	Final_score	60.23	—	Unverified
10	gooners	Final_score	60.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Orange-mini	0-shot MRR	99.19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HybridBeam+	SI-SDRi	13.3	—	Unverified