SOTAVerified

HellaSwag

Papers

Showing 125 of 39 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
DataDecide: How to Predict Best Pretraining Data with Small ExperimentsCode3
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask BenchmarkCode1
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data AugmentationCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language ModelsCode1
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA OptimizationCode1
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning0
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs0
Promises, Outlooks and Challenges of Diffusion Language Modeling0
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
Slimming Down LLMs Without Losing Their Minds0
Comparing Test Sets with Item Response Theory0
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs0
Contrastive Decoding Improves Reasoning in Large Language Models0
Towards Multilingual LLM Evaluation for European Languages0
GRIN: GRadient-INformed MoE0
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation0
Who's Harry Potter? Approximate Unlearning in LLMs0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.