SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 110 of 57 papers

TitleStatusHype
Measuring Hong Kong Massive Multi-Task Language Understanding0
Effectiveness of Zero-shot-CoT in Japanese Prompts0
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic LanguagesCode1
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding0
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningCode15
MMLU-CF: A Contamination-free Multi-task Language Understanding BenchmarkCode2
Llama 3 Meets MoE: Efficient UpcyclingCode0
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data0
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning0
The Llama 3 Herd of ModelsCode4
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.