SOTAVerified

MMLU

Papers

Showing 291300 of 340 papers

TitleStatusHype
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models0
Quantifying Variance in Evaluation Benchmarks0
GEB-1.3B: Open Lightweight Large Language Model0
An Empirical Study of Mamba-based Language Models0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization FunctionCode0
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures0
Spanish and LLM Benchmarks: is MMLU Lost in Translation?0
GECKO: Generative Language Model for English, Code and Korean0
Show:102550
← PrevPage 30 of 34Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified