SOTAVerified

MMLU

Papers

Showing 301340 of 340 papers

TitleStatusHype
LM-Cocktail: Resilient Tuning of Language Models via Model Merging0
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and QuantizationCode1
AcademicGPT: Empowering Academic Research0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology0
MedAgents: Large Language Models as Collaborators for Zero-shot Medical ReasoningCode2
Rethinking Benchmark and Contamination for Language Models with Rephrased SamplesCode2
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise0
An Open Source Data Contamination Report for Large Language ModelsCode1
Evaluation of large language models using an Indian language LGBTI+ lexicon0
Irreducible Curriculum for Language Model Pretraining0
Instruction Tuning with Human Curriculum0
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models0
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language ModelsCode1
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent CollaborationCode1
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language ModelsCode0
Baichuan 2: Open Large-scale Language ModelsCode4
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from ScratchCode1
Pruning Large Language Models via Accuracy Predictor0
Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrationsCode0
The Poison of Alignment0
Red-Teaming Large Language Models using Chain of Utterances for Safety-AlignmentCode1
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning0
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-InCode1
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language ModelsCode1
Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text TransformersCode1
Towards Expert-Level Medical Question Answering with Large Language ModelsCode1
From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction TuningCode1
ART: Automatic multi-step reasoning and tool-use for large language modelsCode6
REPLUG: Retrieval-Augmented Black-Box Language ModelsCode3
Inconsistencies in Masked Language ModelsCode0
Large Language Models Encode Clinical KnowledgeCode1
Galactica: A Large Language Model for ScienceCode4
Measuring Progress on Scalable Oversight for Large Language Models0
Scaling Instruction-Finetuned Language ModelsCode3
Transcending Scaling Laws with 0.1% Extra Compute0
Atlas: Few-shot Learning with Retrieval Augmented Language ModelsCode2
UL2: Unifying Language Learning ParadigmsCode1
Training Compute-Optimal Large Language ModelsCode6
Show:102550
← PrevPage 7 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1go ahead, make my dataFinal_score61.72Unverified
2#GreedyCowFinal_score61.63Unverified
3Don't Ask Us yFinal_score61.4Unverified
4Data_and_ConfusedFinal_score60.96Unverified
5WafflesFinal_score60.91Unverified
6raakaFinal_score60.91Unverified
7Team ProcrustinationFinal_score60.64Unverified
8Axiom Consulting PartnersFinal_score60.63Unverified
9Lets_Be_FairFinal_score60.23Unverified
10goonersFinal_score60.22Unverified
#ModelMetricClaimedVerifiedStatus
1Orange-mini0-shot MRR99.19Unverified
#ModelMetricClaimedVerifiedStatus
1HybridBeam+SI-SDRi13.3Unverified