| Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking | May 20, 2025 | HumanEvalmbpp | CodeCode Available | 1 | 5 |
| ContraCLM: Contrastive Learning For Causal Language Model | Oct 3, 2022 | Code GenerationCode Search | CodeCode Available | 1 | 5 |
| AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation | Oct 1, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Can Programming Languages Boost Each Other via Instruction Tuning? | Aug 31, 2023 | HumanEval | CodeCode Available | 0 | 5 |
| Instruction Fusion: Advancing Prompt Evolution through Hybridization | Dec 25, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| HumanEval on Latest GPT Models -- 2024 | Feb 20, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings | May 19, 2025 | HumanEvalMath | CodeCode Available | 0 | 5 |
| Can Github issues be solved with Tree Of Thoughts? | May 20, 2024 | Code GenerationGitHub issue resolution | CodeCode Available | 0 | 5 |
| FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system | Oct 28, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study | Mar 22, 2024 | Code CompletionHumanEval | CodeCode Available | 0 | 5 |
| RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | Oct 2, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Measuring the Influence of Incorrect Code on Test Generation | Sep 14, 2024 | HumanEvalLarge Language Model | CodeCode Available | 0 | 5 |
| Large Language Models of Code Fail at Completing Code with Potential Bugs | Jun 6, 2023 | Code CompletionHumanEval | CodeCode Available | 0 | 5 |
| Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models | Sep 27, 2023 | HumanEvalLanguage Modeling | CodeCode Available | 0 | 5 |
| Using Large Language Models to Generate JUnit Tests: An Empirical Study | Apr 30, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions | Mar 6, 2025 | BenchmarkingHumanEval | CodeCode Available | 0 | 5 |
| Evaluating How Fine-tuning on Bimodal Data Effects Code Generation | Nov 15, 2022 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models | Jan 15, 2024 | HumanEvalLanguage Modelling | CodeCode Available | 0 | 5 |
| Self-Correcting Code Generation Using Small Language Models | May 29, 2025 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Self-Edit: Fault-Aware Code Editor for Code Generation | May 6, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency | Sep 29, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 | 5 |
| Multi-Programming Language Ensemble for Code Generation in Large Language Model | Sep 6, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation | Oct 19, 2024 | Code GenerationDiversity | CodeCode Available | 0 | 5 |
| Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective | Apr 11, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 | 5 |
| Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers | Nov 26, 2024 | HumanEvalmbpp | CodeCode Available | 0 | 5 |
| CodeT5+: Open Code Large Language Models for Code Understanding and Generation | May 13, 2023 | Arithmetic ReasoningCode Completion | CodeCode Available | 0 | 5 |
| Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | May 12, 2025 | Code GenerationComment Generation | CodeCode Available | 0 | 5 |
| Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | Oct 28, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 | 5 |
| CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | Feb 13, 2025 | 8kGPU | CodeCode Available | 0 | 5 |
| CoCoNUT: Structural Code Understanding does not fall out of a tree | Jan 27, 2025 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation | Nov 1, 2024 | Code TranslationHumanEval | CodeCode Available | 0 | 5 |
| KV Prediction for Improved Time to First Token | Oct 10, 2024 | Code CompletionCPU | CodeCode Available | 0 | 5 |
| Software Vulnerability and Functionality Assessment using LLMs | Mar 13, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| ACECODER: Acing Coder RL via Automated Test-Case Synthesis | Feb 3, 2025 | HumanEvalmbpp | —Unverified | 0 | 0 |
| Actor-Critic based Online Data Mixing For Language Model Pre-Training | May 29, 2025 | HumanEvalLanguage Modeling | —Unverified | 0 | 0 |
| Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment | Oct 23, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Addressing Data Leakage in HumanEval Using Combinatorial Test Design | Dec 2, 2024 | HumanEval | —Unverified | 0 | 0 |
| AIME: AI System Optimization via Multiple LLM Evaluators | Oct 4, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Aligning CodeLLMs with Direct Preference Optimization | Oct 24, 2024 | Decision MakingHumanEval | —Unverified | 0 | 0 |
| AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement | Dec 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks | May 27, 2025 | Code GenerationCode Summarization | —Unverified | 0 | 0 |
| A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks | Nov 23, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement | Apr 29, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining | Sep 3, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| A Review of Repository Level Prompting for LLMs | Dec 15, 2023 | Code CompletionCode Generation | —Unverified | 0 | 0 |
| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 | 0 |