SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 1120 of 57 papers

TitleStatusHype
Claude 3.5 Sonnet Model Card Addendum0
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for EnsemblingCode2
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models0
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkCode3
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLMCode0
The Claude 3 Model Family: Opus, Sonnet, Haiku0
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
Routoo: Learning to Route to Large Language Models EffectivelyCode2
Mixtral of ExpertsCode4
Show:102550
← PrevPage 2 of 6Next →

No leaderboard results yet.