| ContraCLM: Contrastive Learning For Causal Language Model | Oct 3, 2022 | Code GenerationCode Search | CodeCode Available | 1 |
| Fault-Aware Neural Code Rankers | Jun 4, 2022 | Code GenerationHumanEval | CodeCode Available | 1 |
| Turning the Tide: Repository-based Code Reflection | Jul 14, 2025 | Code GenerationDiversity | —Unverified | 0 |
| SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization | Jun 25, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 |
| Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees | Jun 17, 2025 | Code TranslationHumanEval | —Unverified | 0 |
| LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing | Jun 17, 2025 | ARCCoLA | —Unverified | 0 |
| Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation | Jun 9, 2025 | GSM8KHumanEval | —Unverified | 0 |
| SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation | May 30, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Actor-Critic based Online Data Mixing For Language Model Pre-Training | May 29, 2025 | HumanEvalLanguage Modeling | —Unverified | 0 |
| Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach | May 29, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Self-Correcting Code Generation Using Small Language Models | May 29, 2025 | Code GenerationHumanEval | CodeCode Available | 0 |
| An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks | May 27, 2025 | Code GenerationCode Summarization | —Unverified | 0 |
| Evaluating Large Language Models for Code Review | May 26, 2025 | HumanEval | —Unverified | 0 |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | May 25, 2025 | GSM8KHumanEval | —Unverified | 0 |
| From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation? | May 24, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Prior Prompt Engineering for Reinforcement Fine-Tuning | May 20, 2025 | HumanEvalPrompt Engineering | —Unverified | 0 |
| Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings | May 19, 2025 | HumanEvalMath | CodeCode Available | 0 |
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | May 15, 2025 | Code GenerationGSM8K | —Unverified | 0 |
| Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | May 12, 2025 | Code GenerationComment Generation | CodeCode Available | 0 |
| AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection | May 12, 2025 | GSM8KHumanEval | —Unverified | 0 |
| CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts | May 8, 2025 | Code CompletionCode Generation | —Unverified | 0 |
| Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis | May 5, 2025 | ArticlesHumanEval | —Unverified | 0 |
| The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models | May 5, 2025 | HumanEvalProgram Repair | —Unverified | 0 |
| ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement | Apr 29, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Type-Constrained Code Generation with Language Models | Apr 12, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs | Apr 5, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Apr 4, 2025 | BenchmarkingGSM8K | —Unverified | 0 |
| Can LLMs Enable Verification in Mainstream Programming? | Mar 18, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models | Mar 10, 2025 | HumanEvalProgram Synthesis | —Unverified | 0 |
| Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol | Mar 7, 2025 | BenchmarkingBug fixing | —Unverified | 0 |
| Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs? | Mar 7, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions | Mar 6, 2025 | BenchmarkingHumanEval | CodeCode Available | 0 |
| Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge | Feb 27, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval | Feb 26, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance | Feb 17, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | Feb 13, 2025 | 8kGPU | CodeCode Available | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Large Language Model Guided Self-Debugging Code Generation | Feb 5, 2025 | Code GenerationComputational Efficiency | —Unverified | 0 |
| ACECODER: Acing Coder RL via Automated Test-Case Synthesis | Feb 3, 2025 | HumanEvalmbpp | —Unverified | 0 |
| Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities | Jan 31, 2025 | Code GenerationHallucination | —Unverified | 0 |
| CoCoNUT: Structural Code Understanding does not fall out of a tree | Jan 27, 2025 | Code GenerationHumanEval | CodeCode Available | 0 |
| QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks | Jan 20, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs | Jan 14, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks | Jan 11, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Dafny as Verification-Aware Intermediate Language for Code Generation | Jan 10, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion | Jan 6, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Dynamic Scaling of Unit Tests for Code Reward Modeling | Jan 2, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | Dec 30, 2024 | BenchmarkingCode Generation | —Unverified | 0 |