| Kotlin ML Pack: Technical Report | May 29, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| KV Prediction for Improved Time to First Token | Oct 10, 2024 | Code CompletionCPU | —Unverified | 0 | 0 |
| Large Language Model Guided Self-Debugging Code Generation | Feb 5, 2025 | Code GenerationComputational Efficiency | —Unverified | 0 | 0 |
| Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge | Feb 27, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models | Feb 13, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Learning to Reason via Self-Iterative Process Feedback for Small Language Models | Dec 11, 2024 | Domain GeneralizationGSM8K | —Unverified | 0 | 0 |
| Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs | Jan 14, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | Mar 12, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | May 25, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing | Jun 17, 2025 | ARCCoLA | —Unverified | 0 | 0 |
| LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression | Sep 25, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | Apr 17, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants | Jul 12, 2024 | HumanEval | —Unverified | 0 | 0 |
| USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding | Sep 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis | May 5, 2025 | ArticlesHumanEval | —Unverified | 0 | 0 |
| MojoBench: Language Modeling and Benchmarks for Mojo | Oct 23, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs | Jan 11, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 | 0 |
| NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | Jan 29, 2024 | HumanEval | —Unverified | 0 | 0 |
| On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation | Apr 26, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs | Apr 5, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | Jul 27, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Past as a Guide: Leveraging Retrospective Learning for Python Code Completion | Nov 13, 2023 | Code CompletionHumanEval | —Unverified | 0 | 0 |
| PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation | Dec 17, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic? | Oct 26, 2022 | HumanEvalLanguage Modelling | —Unverified | 0 | 0 |