| Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy | Jan 20, 2025 | MMLU | CodeCode Available | 0 |
| ORBIT: Cost-Effective Dataset Curation for Large Language Model Domain Adaptation with an Astronomy Case Study | Dec 19, 2024 | AstronomyDomain Adaptation | CodeCode Available | 0 |
| CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions | Oct 4, 2024 | Instruction FollowingMMLU | CodeCode Available | 0 |
| BenTo: Benchmark Task Reduction with In-Context Transferability | Oct 17, 2024 | In-Context LearningMMLU | CodeCode Available | 0 |
| LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity | Oct 4, 2024 | DiversityEnsemble Pruning | CodeCode Available | 0 |
| Performance Law of Large Language Models | Aug 19, 2024 | MMLU | CodeCode Available | 0 |
| Evaluation of Large Language Models via Coupled Token Generation | Feb 3, 2025 | ChatbotLarge Language Model | CodeCode Available | 0 |
| Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations | Aug 27, 2023 | Instruction FollowingMMLU | CodeCode Available | 0 |
| Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function | Jun 3, 2024 | DiversityMMLU | CodeCode Available | 0 |
| Post-Hoc Reversal: Are We Selecting Models Prematurely? | Apr 11, 2024 | Language ModellingMMLU | CodeCode Available | 0 |