SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 4150 of 57 papers

TitleStatusHype
BloombergGPT: A Large Language Model for FinanceCode0
Textbooks Are All You Need II: phi-1.5 technical reportCode0
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLMCode0
PaLM 2 Technical ReportCode0
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning0
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding0
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data0
Effectiveness of Zero-shot-CoT in Japanese Prompts0
Model Card and Evaluations for Claude Models0
The Claude 3 Model Family: Opus, Sonnet, Haiku0
Show:102550
← PrevPage 5 of 6Next →

No leaderboard results yet.