| What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks | Apr 10, 2025 | Common Sense ReasoningHellaSwag | CodeCode Available | 0 | 5 |
| Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs | Sep 30, 2024 | ARCDiversity | —Unverified | 0 | 0 |
| Promises, Outlooks and Challenges of Diffusion Language Modeling | Jun 17, 2024 | ARCHellaSwag | —Unverified | 0 | 0 |
| Comparing Test Sets with Item Response Theory | Jun 1, 2021 | HellaSwagNatural Language Understanding | —Unverified | 0 | 0 |
| English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too | May 26, 2020 | Cross-Lingual TransferHellaSwag | —Unverified | 0 | 0 |
| Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst | May 20, 2025 | ARCGSM8K | —Unverified | 0 | 0 |
| When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation | Nov 16, 2021 | Data AugmentationHellaSwag | —Unverified | 0 | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 | 0 |
| SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs | Dec 11, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 | 0 |