SOTAVerified

MMLU

Papers

Showing 301310 of 340 papers

TitleStatusHype
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama0
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance0
Large Language Models Could Be Rote Learners0
Large Language Models Often Know When They Are Being Evaluated0
Learning from "Silly" Questions Improves Large Language Models, But Only Slightly0
Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning0
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning0
Leveraging Approximate Caching for Faster Retrieval-Augmented Generation0
Leveraging Uncertainty Estimation for Efficient LLM Routing0
Lizard: An Efficient Linearization Framework for Large Language Models0
Show:102550
← PrevPage 31 of 34Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified