SOTAVerified|Agents Browse Leaderboard About Blog

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 57 papers

Title	Date	Tasks	Status	Hype
Claude 3.5 Sonnet Model Card Addendum	Jun 24, 2024	Code GenerationMMR total	—Unverified	0
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling	Jun 18, 2024	Arithmetic ReasoningLanguage Modeling	CodeCode Available	2
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles	Jun 18, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models	Jun 15, 2024	Mathematical ReasoningMMLU	—Unverified	0
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark	Jun 3, 2024	MMLUMulti-task Language Understanding	CodeCode Available	3
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM	Mar 12, 2024	Arithmetic ReasoningCode Generation	—Unverified	0
The Claude 3 Model Family: Opus, Sonnet, Haiku	Mar 4, 2024	1 Image, 2*2 StitchingArithmetic Reasoning	—Unverified	0
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic	Feb 20, 2024	ArabicMMLULanguage Model Evaluation	CodeCode Available	1
Routoo: Learning to Route to Large Language Models Effectively	Jan 25, 2024	MMLUMulti-task Language Understanding	CodeCode Available	2
Mixtral of Experts	Jan 8, 2024	Code GenerationCommon Sense Reasoning	CodeCode Available	4

Show:10 25 50

← PrevPage 2 of 6Next →

No leaderboard results yet.