| Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression | Apr 10, 2025 | MathMMLU | CodeCode Available | 1 |
| GAAPO: Genetic Algorithmic Applied to Prompt Optimization | Apr 9, 2025 | MMLUPrompt Engineering | —Unverified | 0 |
| Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | Apr 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Sample, Don't Search: Rethinking Test-Time Alignment for Language Models | Apr 4, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment | Apr 3, 2025 | ARCHellaSwag | —Unverified | 0 |
| YourBench: Easy Custom Evaluation Sets for Everyone | Apr 2, 2025 | MMLU | CodeCode Available | 3 |
| Order Independence With Finetuning | Mar 30, 2025 | ARCLanguage Modeling | —Unverified | 0 |
| Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models | Mar 28, 2025 | MMLUQuantization | CodeCode Available | 2 |
| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 |
| Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark | Mar 26, 2025 | MMLUMultiple-choice | CodeCode Available | 1 |