| Discrete Flow Matching | Jul 22, 2024 | HumanEvalmbpp | —Unverified | 0 |
| MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants | Jul 12, 2024 | HumanEval | —Unverified | 0 |
| Brevity is the soul of wit: Pruning long files for code generation | Jun 29, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Towards Large Language Model Aided Program Refinement | Jun 26, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 |
| Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models | Jun 20, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency | Jun 18, 2024 | HumanEvalmbpp | —Unverified | 0 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 |
| Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | Jun 15, 2024 | BenchmarkingHumanEval | —Unverified | 0 |
| Validating LLM-Generated Programs with Metamorphic Prompt Testing | Jun 11, 2024 | HumanEval | —Unverified | 0 |
| PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases | Jun 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Does your data spark joy? Performance gains from domain upsampling at the end of training | Jun 5, 2024 | GSM8KHumanEval | —Unverified | 0 |
| SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths | May 30, 2024 | GSM8KHumanEval | —Unverified | 0 |
| Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation | May 30, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code | May 29, 2024 | HumanEval | —Unverified | 0 |
| Kotlin ML Pack: Technical Report | May 29, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Can Github issues be solved with Tree Of Thoughts? | May 20, 2024 | Code GenerationGitHub issue resolution | CodeCode Available | 0 |
| On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation | Apr 26, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| BASS: Batched Attention-optimized Speculative Sampling | Apr 24, 2024 | GPUHumanEval | —Unverified | 0 |
| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 |
| Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | Apr 17, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective | Apr 11, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Exploring and Evaluating Hallucinations in LLM-Powered Code Generation | Apr 1, 2024 | Code GenerationHallucination | —Unverified | 0 |
| Reasoning Runtime Behavior of a Program with LLM: How Far Are We? | Mar 25, 2024 | HumanEval | —Unverified | 0 |
| CodeShell Technical Report | Mar 23, 2024 | 8kHumanEval | —Unverified | 0 |
| SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents | Mar 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study | Mar 22, 2024 | Code CompletionHumanEval | CodeCode Available | 0 |
| Software Vulnerability and Functionality Assessment using LLMs | Mar 13, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 |
| LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | Mar 12, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Test-Driven Development for Code Generation | Feb 21, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| HumanEval on Latest GPT Models -- 2024 | Feb 20, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models | Feb 13, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | Jan 29, 2024 | HumanEval | —Unverified | 0 |
| A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models | Jan 15, 2024 | HumanEvalLanguage Modelling | CodeCode Available | 0 |
| Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs | Jan 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs | Jan 8, 2024 | Code GenerationDiversity | —Unverified | 0 |
| Instruction Fusion: Advancing Prompt Evolution through Hybridization | Dec 25, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| A Review of Repository Level Prompting for LLMs | Dec 15, 2023 | Code CompletionCode Generation | —Unverified | 0 |
| Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data | Dec 5, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| Past as a Guide: Leveraging Retrospective Learning for Python Code Completion | Nov 13, 2023 | Code CompletionHumanEval | —Unverified | 0 |
| Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | Oct 28, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation | Oct 16, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model | Oct 10, 2023 | Code GenerationCode Translation | —Unverified | 0 |
| The Program Testing Ability of Large Language Models for Code | Oct 9, 2023 | HumanEvalmbpp | —Unverified | 0 |
| Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency | Sep 29, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models | Sep 27, 2023 | HumanEvalLanguage Modeling | CodeCode Available | 0 |
| LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression | Sep 25, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| Can Programming Languages Boost Each Other via Instruction Tuning? | Aug 31, 2023 | HumanEval | CodeCode Available | 0 |
| CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation | Aug 17, 2023 | Code GenerationFew-Shot Learning | —Unverified | 0 |