SOTAVerified

MMLU

Papers

Showing 131140 of 340 papers

TitleStatusHype
DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM PerformanceCode0
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI0
Humanity's Last Exam0
On the Reasoning Capacity of AI Models and How to Quantify It0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs0
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought ThinkingCode1
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension DiscrepancyCode0
Control LLM: Controlled Evolution for Intelligence Retention in LLMCode1
DNA 1.0 Technical Report0
Show:102550
← PrevPage 14 of 34Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified