SOTAVerified

MMLU

Papers

Showing 151175 of 340 papers

TitleStatusHype
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation0
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs0
Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling0
DEM: Distribution Edited Model for Training with Mixed Data Distributions0
Detecting Benchmark Contamination Through Watermarking0
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?0
Distributional Scaling Laws for Emergent Capabilities0
DNA 1.0 Technical Report0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
Do Large Language Models Mirror Cognitive Language Processing?0
Domain-Adaptive Continued Pre-Training of Small Language Models0
DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining0
Dual Decomposition of Weights and Singular Value Low Rank Adaptation0
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge0
Effectiveness of Zero-shot-CoT in Japanese Prompts0
Efficient Data Selection at Scale via Influence Distillation0
Efficient Federated Search for Retrieval-Augmented Generation0
Efficiently Deploying LLMs with Controlled Risk0
Efficient Model Development through Fine-tuning Transfer0
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities0
Eir: Thai Medical Large Language Models0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Show:102550
← PrevPage 7 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified