| CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X | Mar 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 5 | 5 |
| OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | Feb 22, 2024 | Code GenerationHumanEval | CodeCode Available | 5 | 5 |
| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 | 5 |
| Magicoder: Empowering Code Generation with OSS-Instruct | Dec 4, 2023 | Code GenerationHumanEval | CodeCode Available | 4 | 5 |
| Scaling Granite Code Models to 128K Context | Jul 18, 2024 | 2k4k | CodeCode Available | 4 | 5 |
| Baichuan 2: Open Large-scale Language Models | Sep 19, 2023 | Feature EngineeringGSM8K | CodeCode Available | 4 | 5 |
| Reflexion: Language Agents with Verbal Reinforcement Learning | Mar 20, 2023 | Decision MakingHumanEval | CodeCode Available | 4 | 5 |
| Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step | Feb 25, 2024 | Code GenerationHumanEval | CodeCode Available | 4 | 5 |
| CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution | Jan 5, 2024 | HumanEvalPrediction | CodeCode Available | 4 | 5 |
| Evaluating Large Language Models Trained on Code | Jul 7, 2021 | Code GenerationHumanEval | CodeCode Available | 3 | 5 |