SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 110 of 57 papers

TitleStatusHype
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningCode15
Llama 2: Open Foundation and Fine-Tuned Chat ModelsCode8
LLaMA: Open and Efficient Foundation Language ModelsCode7
Mistral 7BCode6
Training Compute-Optimal Large Language ModelsCode6
GPT-4 Technical ReportCode6
GLM-130B: An Open Bilingual Pre-trained ModelCode6
The Llama 3 Herd of ModelsCode4
Galactica: A Large Language Model for ScienceCode4
Mixtral of ExpertsCode4
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.