| LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient | Feb 2, 2025 | MMLU | CodeCode Available | 0 | 5 |
| LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity | Oct 4, 2024 | DiversityEnsemble Pruning | CodeCode Available | 0 | 5 |
| Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Jul 15, 2025 | DiversityMMLU | CodeCode Available | 0 | 5 |
| Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation | May 16, 2025 | MathMMLU | —Unverified | 0 | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 | 0 |
| Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation | Dec 4, 2024 | MMLU | —Unverified | 0 | 0 |
| Cost-Saving LLM Cascades with Early Abstention | Feb 13, 2025 | GSM8KMMLU | —Unverified | 0 | 0 |
| GEM: Empowering LLM for both Embedding Generation and Language Understanding | Jun 4, 2025 | DecoderLarge Language Model | —Unverified | 0 | 0 |
| Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning | Jul 2, 2024 | Active LearningLanguage Modelling | —Unverified | 0 | 0 |