SOTAVerified

HellaSwag

Papers

Showing 125 of 39 papers

TitleStatusHype
Slimming Down LLMs Without Losing Their Minds0
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM EvaluationCode0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
DataDecide: How to Predict Best Pretraining Data with Small ExperimentsCode3
Domain-Adaptive Continued Pre-Training of Small Language Models0
What the HellaSwag? On the Validity of Common-Sense Reasoning BenchmarksCode0
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment0
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models0
HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning0
FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level FilteringCode0
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs0
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA OptimizationCode1
Towards Multilingual LLM Evaluation for European Languages0
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs0
GRIN: GRadient-INformed MoE0
You can remove GPT2's LayerNorm by fine-tuningCode0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language ModelsCode0
Promises, Outlooks and Challenges of Diffusion Language Modeling0
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
SaGE: Evaluating Moral Consistency in Large Language ModelsCode0
Attacks on Node Attributes in Graph Neural NetworksCode0
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language ModelsCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
Who's Harry Potter? Approximate Unlearning in LLMs0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.