| AgentInstruct: Toward Generative Teaching with Agentic Flows | Jul 3, 2024 | GSM8KMMLU | —Unverified | 0 |
| Evaluation of large language models using an Indian language LGBTI+ lexicon | Oct 26, 2023 | Machine TranslationMMLU | —Unverified | 0 |
| Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks | Feb 24, 2025 | Mixture-of-ExpertsMMLU | —Unverified | 0 |
| LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering | Dec 13, 2024 | Few-Shot LearningKnowledge Distillation | —Unverified | 0 |
| Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems | Mar 19, 2025 | counterfactualDecision Making | —Unverified | 0 |
| Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models | Nov 29, 2024 | MMLU | —Unverified | 0 |
| Enterprise Large Language Model Evaluation Benchmark | Jun 25, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets | May 9, 2025 | MMLU | —Unverified | 0 |
| LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction | Apr 1, 2024 | Image CaptioningInstruction Following | —Unverified | 0 |
| Eir: Thai Medical Large Language Models | Sep 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| AcademicGPT: Empowering Academic Research | Nov 21, 2023 | Abstract generationGeneral Knowledge | —Unverified | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 |
| Uncovering Latent Chain of Thought Vectors in Language Models | Sep 21, 2024 | ARCGSM8K | —Unverified | 0 |
| Large Language Model Compression with Neural Architecture Search | Oct 9, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| LM-Cocktail: Resilient Tuning of Language Models via Model Merging | Nov 22, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Changing Answer Order Can Decrease MMLU Accuracy | Jun 27, 2024 | MMLUMultiple-choice | —Unverified | 0 |
| Efficient Model Development through Fine-tuning Transfer | Mar 25, 2025 | MMLUmodel | —Unverified | 0 |
| Efficiently Deploying LLMs with Controlled Risk | Oct 3, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| Efficient Federated Search for Retrieval-Augmented Generation | Feb 26, 2025 | MMLURAG | —Unverified | 0 |
| Efficient Data Selection at Scale via Influence Distillation | May 25, 2025 | GSM8KMMLU | —Unverified | 0 |
| ChainRank-DPO: Chain Rank Direct Preference Optimization for LLM Rankers | Dec 18, 2024 | MMLUReranking | —Unverified | 0 |
| Effectiveness of Zero-shot-CoT in Japanese Prompts | Mar 9, 2025 | Abstract AlgebraCollege Mathematics | —Unverified | 0 |
| From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment | Jun 7, 2025 | ARCMMLU | —Unverified | 0 |
| Lizard: An Efficient Linearization Framework for Large Language Models | Jul 11, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| B-score: Detecting biases in large language models using response history | May 24, 2025 | MMLU | —Unverified | 0 |