| LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression | Sep 25, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | Apr 17, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants | Jul 12, 2024 | HumanEval | —Unverified | 0 | 0 |
| USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding | Sep 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis | May 5, 2025 | ArticlesHumanEval | —Unverified | 0 | 0 |
| MojoBench: Language Modeling and Benchmarks for Mojo | Oct 23, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs | Jan 11, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 | 0 |
| NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | Jan 29, 2024 | HumanEval | —Unverified | 0 | 0 |
| On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation | Apr 26, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |