SOTAVerified|Agents Browse Leaderboard About

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 57 papers

Title	Date	Tasks	Status	Hype
Measuring Hong Kong Massive Multi-Task Language Understanding	May 4, 2025	MMLUMulti-task Language Understanding	—Unverified	0
Effectiveness of Zero-shot-CoT in Japanese Prompts	Mar 9, 2025	Abstract AlgebraCollege Mathematics	—Unverified	0
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages	Feb 16, 2025	Machine TranslationMMLU	CodeCode Available	1
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding	Jan 27, 2025	BenchmarkingDiversity	—Unverified	0
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	Jan 22, 2025	Mathematical ReasoningMulti-task Language Understanding	CodeCode Available	15
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark	Dec 19, 2024	MMLUMultiple-choice	CodeCode Available	2
Llama 3 Meets MoE: Efficient Upcycling	Dec 13, 2024	Mixture-of-ExpertsMMLU	CodeCode Available	0
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data	Oct 3, 2024	Active LearningLanguage Modeling	—Unverified	0
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning	Aug 16, 2024	counterfactualMMLU	—Unverified	0
The Llama 3 Herd of Models	Jul 31, 2024	answerability predictionLanguage Modeling	CodeCode Available	4

Show:10 25 50

← PrevPage 1 of 6Next →

No leaderboard results yet.