| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation | May 30, 2025 | Continual PretrainingFairness | CodeCode Available | 0 |
| Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst | May 20, 2025 | ARCGSM8K | —Unverified | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 |
| DataDecide: How to Predict Best Pretraining Data with Small Experiments | Apr 15, 2025 | ARCHellaSwag | CodeCode Available | 3 |
| Domain-Adaptive Continued Pre-Training of Small Language Models | Apr 13, 2025 | Domain AdaptationHellaSwag | —Unverified | 0 |
| What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks | Apr 10, 2025 | Common Sense ReasoningHellaSwag | CodeCode Available | 0 |
| More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment | Apr 3, 2025 | ARCHellaSwag | —Unverified | 0 |
| Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models | Feb 20, 2025 | HellaSwagMemorization | —Unverified | 0 |
| HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning | Feb 17, 2025 | HellaSwag | —Unverified | 0 |
| FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level Filtering | Jan 13, 2025 | DescriptiveHellaSwag | CodeCode Available | 0 |
| SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs | Dec 11, 2024 | ARCGSM8K | —Unverified | 0 |
| LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | Oct 27, 2024 | GSM8KHellaSwag | CodeCode Available | 1 |
| Towards Multilingual LLM Evaluation for European Languages | Oct 11, 2024 | ARCGSM8K | —Unverified | 0 |
| Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs | Sep 30, 2024 | ARCDiversity | —Unverified | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 |
| You can remove GPT2's LayerNorm by fine-tuning | Sep 6, 2024 | HellaSwag | CodeCode Available | 0 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 |
| Promises, Outlooks and Challenges of Diffusion Language Modeling | Jun 17, 2024 | ARCHellaSwag | —Unverified | 0 |
| LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding | Apr 25, 2024 | GSM8KHellaSwag | CodeCode Available | 3 |
| SaGE: Evaluating Moral Consistency in Large Language Models | Feb 21, 2024 | Decision MakingHellaSwag | CodeCode Available | 0 |
| Attacks on Node Attributes in Graph Neural Networks | Feb 19, 2024 | Contrastive LearningHellaSwag | CodeCode Available | 0 |
| Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models | Dec 29, 2023 | HellaSwag | CodeCode Available | 1 |
| An Open Source Data Contamination Report for Large Language Models | Oct 26, 2023 | HellaSwagLanguage Modeling | CodeCode Available | 1 |
| Who's Harry Potter? Approximate Unlearning in LLMs | Oct 3, 2023 | ARCGPU | —Unverified | 0 |