| Training Compute-Optimal Large Language Models | Mar 29, 2022 | AnachronismsAnalogical Similarity | CodeCode Available | 6 | 5 |
| LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding | Apr 25, 2024 | GSM8KHellaSwag | CodeCode Available | 3 | 5 |
| DataDecide: How to Predict Best Pretraining Data with Small Experiments | Apr 15, 2025 | ARCHellaSwag | CodeCode Available | 3 | 5 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 | 5 |
| When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation | Mar 17, 2022 | Data AugmentationHellaSwag | CodeCode Available | 1 | 5 |
| UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark | Mar 24, 2021 | Common Sense ReasoningHellaSwag | CodeCode Available | 1 | 5 |
| Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models | Dec 29, 2023 | HellaSwag | CodeCode Available | 1 | 5 |
| LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | Oct 27, 2024 | GSM8KHellaSwag | CodeCode Available | 1 | 5 |
| An Open Source Data Contamination Report for Large Language Models | Oct 26, 2023 | HellaSwagLanguage Modeling | CodeCode Available | 1 | 5 |
| In-Contextual Gender Bias Suppression for Large Language Models | Sep 13, 2023 | counterfactualData Augmentation | CodeCode Available | 0 | 5 |