| SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents | Mar 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study | Mar 22, 2024 | Code CompletionHumanEval | CodeCode Available | 0 |
| CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | Mar 14, 2024 | HumanEval | CodeCode Available | 7 |
| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 |
| Software Vulnerability and Functionality Assessment using LLMs | Mar 13, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| AutoDev: Automated AI-Driven Development | Mar 13, 2024 | Code GenerationHumanEval | CodeCode Available | 11 |
| LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | Mar 12, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models | Mar 11, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| LLM4Decompile: Decompiling Binary Code with Large Language Models | Mar 8, 2024 | HumanEval | CodeCode Available | 9 |
| HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization | Feb 26, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |