| How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data | Sep 5, 2024 | Code GenerationDiversity | CodeCode Available | 1 | 5 |
| Better & Faster Large Language Models via Multi-token Prediction | Apr 30, 2024 | HumanEvalmbpp | CodeCode Available | 1 | 5 |
| Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models | Feb 24, 2024 | HumanEvalMemorization | CodeCode Available | 1 | 5 |
| CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models | Feb 23, 2025 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |
| ContraCLM: Contrastive Learning For Causal Language Model | Oct 3, 2022 | Code GenerationCode Search | CodeCode Available | 1 | 5 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 | 5 |
| ANPL: Towards Natural Programming with Interactive Decomposition | May 29, 2023 | ARCCode Generation | CodeCode Available | 1 | 5 |
| Multi-lingual Evaluation of Code Generation Models | Oct 26, 2022 | Code CompletionCode Generation | CodeCode Available | 1 | 5 |
| How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark | Jun 10, 2024 | HumanEvalProgram Synthesis | CodeCode Available | 1 | 5 |
| RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing | Mar 10, 2025 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |