SOTAVerified

MMLU

Papers

Showing 301325 of 340 papers

TitleStatusHype
LM-Cocktail: Resilient Tuning of Language Models via Model Merging0
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and QuantizationCode1
AcademicGPT: Empowering Academic Research0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology0
MedAgents: Large Language Models as Collaborators for Zero-shot Medical ReasoningCode2
Rethinking Benchmark and Contamination for Language Models with Rephrased SamplesCode2
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise0
An Open Source Data Contamination Report for Large Language ModelsCode1
Evaluation of large language models using an Indian language LGBTI+ lexicon0
Irreducible Curriculum for Language Model Pretraining0
Instruction Tuning with Human Curriculum0
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models0
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language ModelsCode1
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent CollaborationCode1
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language ModelsCode0
Baichuan 2: Open Large-scale Language ModelsCode4
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from ScratchCode1
Pruning Large Language Models via Accuracy Predictor0
Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrationsCode0
The Poison of Alignment0
Red-Teaming Large Language Models using Chain of Utterances for Safety-AlignmentCode1
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning0
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-InCode1
Show:102550
← PrevPage 13 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified