| Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach | May 29, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Actor-Critic based Online Data Mixing For Language Model Pre-Training | May 29, 2025 | HumanEvalLanguage Modeling | —Unverified | 0 |
| Self-Correcting Code Generation Using Small Language Models | May 29, 2025 | Code GenerationHumanEval | CodeCode Available | 0 |
| An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks | May 27, 2025 | Code GenerationCode Summarization | —Unverified | 0 |
| Evaluating Large Language Models for Code Review | May 26, 2025 | HumanEval | —Unverified | 0 |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | May 25, 2025 | GSM8KHumanEval | —Unverified | 0 |
| From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation? | May 24, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Prior Prompt Engineering for Reinforcement Fine-Tuning | May 20, 2025 | HumanEvalPrompt Engineering | —Unverified | 0 |
| Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking | May 20, 2025 | HumanEvalmbpp | CodeCode Available | 1 |
| Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings | May 19, 2025 | HumanEvalMath | CodeCode Available | 0 |