| Reinforcing General Reasoning without Verifiers | May 27, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Accurate LoRA-Finetuning Quantization of LLMs via Information Retention | Feb 8, 2024 | MMLUQuantization | CodeCode Available | 2 |
| Rethinking Benchmark and Contamination for Language Models with Rephrased Samples | Nov 8, 2023 | HumanEvalMMLU | CodeCode Available | 2 |
| A StrongREJECT for Empty Jailbreaks | Feb 15, 2024 | MMLU | CodeCode Available | 2 |
| EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models | Dec 11, 2023 | BenchmarkingEmotional Intelligence | CodeCode Available | 2 |
| AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs | Apr 21, 2024 | MMLURed Teaming | CodeCode Available | 2 |
| Atlas: Few-shot Learning with Retrieval Augmented Language Models | Aug 5, 2022 | Fact CheckingFew-Shot Learning | CodeCode Available | 2 |
| Inheritune: Training Smaller Yet More Attentive Language Models | Apr 12, 2024 | DecoderLanguage Modelling | CodeCode Available | 2 |
| MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Dec 19, 2024 | MMLUMultiple-choice | CodeCode Available | 2 |
| Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models | Mar 28, 2025 | MMLUQuantization | CodeCode Available | 2 |