SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 1120 of 57 papers

TitleStatusHype
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkCode3
REPLUG: Retrieval-Augmented Black-Box Language ModelsCode3
Scaling Instruction-Finetuned Language ModelsCode3
Evaluating Large Language Models Trained on CodeCode3
Language Models are Few-Shot LearnersCode3
MMLU-CF: A Contamination-free Multi-task Language Understanding BenchmarkCode2
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for EnsemblingCode2
Routoo: Learning to Route to Large Language Models EffectivelyCode2
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General TasksCode2
Atlas: Few-shot Learning with Retrieval Augmented Language ModelsCode2
Show:102550
← PrevPage 2 of 6Next →

No leaderboard results yet.