SOTAVerified|Agents Browse Leaderboard About

Multi-task Language Understanding

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 57 papers

Title	Date	Tasks	Status	Hype	Score
BloombergGPT: A Large Language Model for Finance	Mar 30, 2023	Causal JudgmentCommon Sense Reasoning	CodeCode Available	0	5
Textbooks Are All You Need II: phi-1.5 technical report	Sep 11, 2023	AllCode Generation	CodeCode Available	0	5
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM	Mar 12, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	0	5
PaLM 2 Technical Report	May 17, 2023	Code GenerationCommon Sense Reasoning	CodeCode Available	0	5
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning	Jun 25, 2023	counterfactualMath	—Unverified	0	0
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding	Jan 27, 2025	BenchmarkingDiversity	—Unverified	0	0
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data	Oct 3, 2024	Active LearningLanguage Modeling	—Unverified	0	0
Effectiveness of Zero-shot-CoT in Japanese Prompts	Mar 9, 2025	Abstract AlgebraCollege Mathematics	—Unverified	0	0
Model Card and Evaluations for Claude Models	Jul 11, 2023	Arithmetic ReasoningBug fixing	—Unverified	0	0
The Claude 3 Model Family: Opus, Sonnet, Haiku	Mar 4, 2024	1 Image, 2*2 StitchingArithmetic Reasoning	—Unverified	0	0

Show:10 25 50

← PrevPage 5 of 6Next →

No leaderboard results yet.