| Discrete Flow Matching | Jul 22, 2024 | HumanEvalmbpp | —Unverified | 0 |
| MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants | Jul 12, 2024 | HumanEval | —Unverified | 0 |
| Brevity is the soul of wit: Pruning long files for code generation | Jun 29, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Towards Large Language Model Aided Program Refinement | Jun 26, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 |
| Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models | Jun 20, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency | Jun 18, 2024 | HumanEvalmbpp | —Unverified | 0 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 |
| Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | Jun 15, 2024 | BenchmarkingHumanEval | —Unverified | 0 |
| Validating LLM-Generated Programs with Metamorphic Prompt Testing | Jun 11, 2024 | HumanEval | —Unverified | 0 |
| PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases | Jun 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Does your data spark joy? Performance gains from domain upsampling at the end of training | Jun 5, 2024 | GSM8KHumanEval | —Unverified | 0 |
| SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths | May 30, 2024 | GSM8KHumanEval | —Unverified | 0 |
| Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation | May 30, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code | May 29, 2024 | HumanEval | —Unverified | 0 |
| Kotlin ML Pack: Technical Report | May 29, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Can Github issues be solved with Tree Of Thoughts? | May 20, 2024 | Code GenerationGitHub issue resolution | CodeCode Available | 0 |
| On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation | Apr 26, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| BASS: Batched Attention-optimized Speculative Sampling | Apr 24, 2024 | GPUHumanEval | —Unverified | 0 |
| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 |
| Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | Apr 17, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective | Apr 11, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Exploring and Evaluating Hallucinations in LLM-Powered Code Generation | Apr 1, 2024 | Code GenerationHallucination | —Unverified | 0 |
| Reasoning Runtime Behavior of a Program with LLM: How Far Are We? | Mar 25, 2024 | HumanEval | —Unverified | 0 |
| CodeShell Technical Report | Mar 23, 2024 | 8kHumanEval | —Unverified | 0 |