SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 2130 of 57 papers

TitleStatusHype
Solving Quantitative Reasoning Problems with Language ModelsCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Measuring Massive Multitask Language UnderstandingCode2
ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsCode2
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic LanguagesCode1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
Gemini: A Family of Highly Capable Multimodal ModelsCode1
MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language ModelsCode1
Show:102550
← PrevPage 3 of 6Next →

No leaderboard results yet.