SOTAVerified|Agents Browse Leaderboard About

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–57 of 57 papers

Title	Date	Tasks	Status	Hype	Score
The Falcon Series of Open Language Models	Nov 28, 2023	DecoderMulti-task Language Understanding	—Unverified	0	0
Claude 3.5 Sonnet Model Card Addendum	Jun 24, 2024	Code GenerationMMR total	—Unverified	0	0
Measuring Hong Kong Massive Multi-Task Language Understanding	May 4, 2025	MMLUMulti-task Language Understanding	—Unverified	0	0
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning	Aug 16, 2024	counterfactualMMLU	—Unverified	0	0
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models	Jun 15, 2024	Mathematical ReasoningMMLU	—Unverified	0	0
Orca 2: Teaching Small Language Models How to Reason	Nov 18, 2023	Arithmetic ReasoningCommon Sense Reasoning	—Unverified	0	0
Transcending Scaling Laws with 0.1% Extra Compute	Oct 20, 2022	Arithmetic ReasoningCross-Lingual Question Answering	—Unverified	0	0

Show:10 25 50

← PrevPage 2 of 2Next →

No leaderboard results yet.