| A StrongREJECT for Empty Jailbreaks | Feb 15, 2024 | MMLU | CodeCode Available | 2 |
| Accurate LoRA-Finetuning Quantization of LLMs via Information Retention | Feb 8, 2024 | MMLUQuantization | CodeCode Available | 2 |
| Routoo: Learning to Route to Large Language Models Effectively | Jan 25, 2024 | MMLUMulti-task Language Understanding | CodeCode Available | 2 |
| Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning | Dec 22, 2023 | Instruction FollowingMixture-of-Experts | CodeCode Available | 2 |
| EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models | Dec 11, 2023 | BenchmarkingEmotional Intelligence | CodeCode Available | 2 |
| MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning | Nov 16, 2023 | MedQAMMLU | CodeCode Available | 2 |
| Rethinking Benchmark and Contamination for Language Models with Rephrased Samples | Nov 8, 2023 | HumanEvalMMLU | CodeCode Available | 2 |
| Atlas: Few-shot Learning with Retrieval Augmented Language Models | Aug 5, 2022 | Fact CheckingFew-Shot Learning | CodeCode Available | 2 |
| The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains | Jul 8, 2025 | MathMMLU | CodeCode Available | 1 |
| SiLVR: A Simple Language-based Video Reasoning Framework | May 30, 2025 | MathMME | CodeCode Available | 1 |