SOTAVerified

MMLU

Papers

Showing 226250 of 340 papers

TitleStatusHype
Unraveling Indirect In-Context Learning Using Influence Functions0
Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach0
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs0
Upcycling Large Language Models into Mixture of Experts0
Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
BrainTransformers: SNN-LLM0
B-score: Detecting biases in large language models using response history0
ChainRank-DPO: Chain Rank Direct Preference Optimization for LLM Rankers0
Changing Answer Order Can Decrease MMLU Accuracy0
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation0
Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning0
Continuous Approximations for Improving Quantization Aware Training of LLMs0
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks0
Cost-aware LLM-based Online Dataset Annotation0
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning0
Cost-Saving LLM Cascades with Early Abstention0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation0
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs0
Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling0
DEM: Distribution Edited Model for Training with Mixed Data Distributions0
Detecting Benchmark Contamination Through Watermarking0
Show:102550
← PrevPage 10 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified