| ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement | Apr 29, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| DataDecide: How to Predict Best Pretraining Data with Small Experiments | Apr 15, 2025 | ARCHellaSwag | CodeCode Available | 3 |
| Type-Constrained Code Generation with Language Models | Apr 12, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs | Apr 5, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Apr 4, 2025 | BenchmarkingGSM8K | —Unverified | 0 |
| Can LLMs Enable Verification in Mainstream Programming? | Mar 18, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models | Mar 10, 2025 | HumanEvalProgram Synthesis | —Unverified | 0 |
| RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing | Mar 10, 2025 | Code GenerationHumanEval | CodeCode Available | 1 |
| Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol | Mar 7, 2025 | BenchmarkingBug fixing | —Unverified | 0 |
| Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs? | Mar 7, 2025 | Code GenerationHumanEval | —Unverified | 0 |