SOTAVerified

MMLU

Papers

Showing 191200 of 340 papers

TitleStatusHype
GRIN: GRadient-INformed MoE0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI0
Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models0
Humanity's Last Exam0
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents0
Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment0
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?0
Actor-Critic based Online Data Mixing For Language Model Pre-Training0
Revisiting Uncertainty Estimation and Calibration of Large Language Models0
Show:102550
← PrevPage 20 of 34Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified