SOTAVerified

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Showing 5157 of 57 papers

TitleStatusHype
The Falcon Series of Open Language Models0
Claude 3.5 Sonnet Model Card Addendum0
Measuring Hong Kong Massive Multi-Task Language Understanding0
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning0
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models0
Orca 2: Teaching Small Language Models How to Reason0
Transcending Scaling Laws with 0.1% Extra Compute0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.