| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 |
| Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation | May 16, 2025 | MathMMLU | —Unverified | 0 |
| Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting | Jun 17, 2024 | EthicsMMLU | —Unverified | 0 |
| AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection | May 12, 2025 | GSM8KHumanEval | —Unverified | 0 |
| GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs | Nov 21, 2024 | MMLUText Generation | —Unverified | 0 |
| Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling | Jun 21, 2024 | ClusteringMMLU | —Unverified | 0 |
| DEM: Distribution Edited Model for Training with Mixed Data Distributions | Jun 21, 2024 | DiversityInstruction Following | —Unverified | 0 |
| Detecting Benchmark Contamination Through Watermarking | Feb 24, 2025 | ARCMMLU | —Unverified | 0 |
| Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? | Feb 26, 2025 | GSM8KMMLU | —Unverified | 0 |
| Distributional Scaling Laws for Emergent Capabilities | Feb 24, 2025 | MMLU | —Unverified | 0 |
| DNA 1.0 Technical Report | Jan 18, 2025 | BelebeleGSM8K | —Unverified | 0 |
| Does your data spark joy? Performance gains from domain upsampling at the end of training | Jun 5, 2024 | GSM8KHumanEval | —Unverified | 0 |
| Do Large Language Models Mirror Cognitive Language Processing? | Feb 28, 2024 | ChatbotLogical Reasoning | —Unverified | 0 |
| Domain-Adaptive Continued Pre-Training of Small Language Models | Apr 13, 2025 | Domain AdaptationHellaSwag | —Unverified | 0 |
| DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining | Sep 30, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 |
| Dual Decomposition of Weights and Singular Value Low Rank Adaptation | May 20, 2025 | GSM8KMMLU | —Unverified | 0 |
| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 |
| Effectiveness of Zero-shot-CoT in Japanese Prompts | Mar 9, 2025 | Abstract AlgebraCollege Mathematics | —Unverified | 0 |
| Efficient Data Selection at Scale via Influence Distillation | May 25, 2025 | GSM8KMMLU | —Unverified | 0 |
| Efficient Federated Search for Retrieval-Augmented Generation | Feb 26, 2025 | MMLURAG | —Unverified | 0 |
| Efficiently Deploying LLMs with Controlled Risk | Oct 3, 2024 | MMLUTruthfulQA | —Unverified | 0 |
| Efficient Model Development through Fine-tuning Transfer | Mar 25, 2025 | MMLUmodel | —Unverified | 0 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| Eir: Thai Medical Large Language Models | Sep 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 |
| Enterprise Large Language Model Evaluation Benchmark | Jun 25, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems | Mar 19, 2025 | counterfactualDecision Making | —Unverified | 0 |
| Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks | Feb 24, 2025 | Mixture-of-ExpertsMMLU | —Unverified | 0 |
| Evaluation of large language models using an Indian language LGBTI+ lexicon | Oct 26, 2023 | Machine TranslationMMLU | —Unverified | 0 |
| Few-Shot Recalibration of Language Models | Mar 27, 2024 | MathMMLU | —Unverified | 0 |
| FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy | Feb 8, 2025 | MMLU | —Unverified | 0 |
| GAAPO: Genetic Algorithmic Applied to Prompt Optimization | Apr 9, 2025 | MMLUPrompt Engineering | —Unverified | 0 |
| Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models | Nov 29, 2024 | MMLU | —Unverified | 0 |
| Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training | Jun 18, 2025 | MedQAMMLU | —Unverified | 0 |
| G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Oct 15, 2024 | HumanEvalLanguage Modelling | —Unverified | 0 |
| GEB-1.3B: Open Lightweight Large Language Model | Jun 14, 2024 | CPULanguage Modeling | —Unverified | 0 |
| GECKO: Generative Language Model for English, Code and Korean | May 24, 2024 | kmmluLanguage Modeling | —Unverified | 0 |
| GEM: Empowering LLM for both Embedding Generation and Language Understanding | Jun 4, 2025 | DecoderLarge Language Model | —Unverified | 0 |
| A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets | May 9, 2025 | MMLU | —Unverified | 0 |
| Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation | Dec 4, 2024 | MMLU | —Unverified | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 |
| HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI | Jan 26, 2025 | MMLUMultiple-choice | —Unverified | 0 |
| Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models | Feb 19, 2024 | MMLU | —Unverified | 0 |
| Humanity's Last Exam | Jan 24, 2025 | Humanity's Last ExamLanguage Modeling | —Unverified | 0 |
| Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents | Dec 1, 2024 | Mathematical ReasoningMMLU | —Unverified | 0 |
| Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training | Feb 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment | Oct 23, 2024 | GSM8KHumanEval | —Unverified | 0 |
| Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | Feb 2, 2025 | MathMMLU | —Unverified | 0 |
| Actor-Critic based Online Data Mixing For Language Model Pre-Training | May 29, 2025 | HumanEvalLanguage Modeling | —Unverified | 0 |
| Revisiting Uncertainty Estimation and Calibration of Large Language Models | May 29, 2025 | Mixture-of-ExpertsMMLU | —Unverified | 0 |