SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 125 of 57 papers

TitleStatusHype
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningCode15
Llama 2: Open Foundation and Fine-Tuned Chat ModelsCode8
LLaMA: Open and Efficient Foundation Language ModelsCode7
GPT-4 Technical ReportCode6
Mistral 7BCode6
GLM-130B: An Open Bilingual Pre-trained ModelCode6
Training Compute-Optimal Large Language ModelsCode6
The Llama 3 Herd of ModelsCode4
Mixtral of ExpertsCode4
Galactica: A Large Language Model for ScienceCode4
REPLUG: Retrieval-Augmented Black-Box Language ModelsCode3
Scaling Instruction-Finetuned Language ModelsCode3
Evaluating Large Language Models Trained on CodeCode3
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkCode3
Language Models are Few-Shot LearnersCode3
Solving Quantitative Reasoning Problems with Language ModelsCode2
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for EnsemblingCode2
Measuring Massive Multitask Language UnderstandingCode2
Atlas: Few-shot Learning with Retrieval Augmented Language ModelsCode2
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General TasksCode2
ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsCode2
Routoo: Learning to Route to Large Language Models EffectivelyCode2
MMLU-CF: A Contamination-free Multi-task Language Understanding BenchmarkCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.