SOTAVerified

HellaSwag

Papers

Showing 139 of 39 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
DataDecide: How to Predict Best Pretraining Data with Small ExperimentsCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA OptimizationCode1
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language ModelsCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data AugmentationCode1
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask BenchmarkCode1
Slimming Down LLMs Without Losing Their Minds0
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM EvaluationCode0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Domain-Adaptive Continued Pre-Training of Small Language Models0
What the HellaSwag? On the Validity of Common-Sense Reasoning BenchmarksCode0
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models0
HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning0
FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level FilteringCode0
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs0
Towards Multilingual LLM Evaluation for European Languages0
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs0
GRIN: GRadient-INformed MoE0
You can remove GPT2's LayerNorm by fine-tuningCode0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language ModelsCode0
Promises, Outlooks and Challenges of Diffusion Language Modeling0
SaGE: Evaluating Moral Consistency in Large Language ModelsCode0
Attacks on Node Attributes in Graph Neural NetworksCode0
Who's Harry Potter? Approximate Unlearning in LLMs0
Contrastive Decoding Improves Reasoning in Large Language Models0
In-Contextual Gender Bias Suppression for Large Language ModelsCode0
Toward Adversarial Training on Contextualized Language RepresentationCode0
GraDA: Graph Generative Data Augmentation for Commonsense ReasoningCode0
On Curriculum Learning for Commonsense ReasoningCode0
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation0
Comparing Test Sets with Item Response Theory0
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too0
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning0
HellaSwag: Can a Machine Really Finish Your Sentence?Code0
Show:102550

No leaderboard results yet.