| ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Jun 18, 2024 | AllGSM8K | CodeCode Available | 14 | 5 |
| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 | 5 |
| AutoDev: Automated AI-Driven Development | Mar 13, 2024 | Code GenerationHumanEval | CodeCode Available | 11 | 5 |
| LLM4Decompile: Decompiling Binary Code with Large Language Models | Mar 8, 2024 | HumanEval | CodeCode Available | 9 | 5 |
| CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | Mar 14, 2024 | HumanEval | CodeCode Available | 7 | 5 |
| CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Aug 7, 2024 | HumanEvalmbpp | CodeCode Available | 7 | 5 |
| Code Llama: Open Foundation Models for Code | Aug 24, 2023 | 16kCode Generation | CodeCode Available | 6 | 5 |
| CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | Mar 25, 2022 | Code GenerationHumanEval | CodeCode Available | 6 | 5 |
| OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | Feb 22, 2024 | Code GenerationHumanEval | CodeCode Available | 5 | 5 |
| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 | 5 |
| CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X | Mar 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 5 | 5 |
| StarCoder: may the source be with you! | May 9, 2023 | 8kCode Generation | CodeCode Available | 5 | 5 |
| WizardCoder: Empowering Code Large Language Models with Evol-Instruct | Jun 14, 2023 | Code GenerationHumanEval | CodeCode Available | 5 | 5 |
| Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step | Feb 25, 2024 | Code GenerationHumanEval | CodeCode Available | 4 | 5 |
| Scaling Granite Code Models to 128K Context | Jul 18, 2024 | 2k4k | CodeCode Available | 4 | 5 |
| Baichuan 2: Open Large-scale Language Models | Sep 19, 2023 | Feature EngineeringGSM8K | CodeCode Available | 4 | 5 |
| Reflexion: Language Agents with Verbal Reinforcement Learning | Mar 20, 2023 | Decision MakingHumanEval | CodeCode Available | 4 | 5 |
| Magicoder: Empowering Code Generation with OSS-Instruct | Dec 4, 2023 | Code GenerationHumanEval | CodeCode Available | 4 | 5 |
| CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution | Jan 5, 2024 | HumanEvalPrediction | CodeCode Available | 4 | 5 |
| LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding | Apr 25, 2024 | GSM8KHellaSwag | CodeCode Available | 3 | 5 |
| Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation | May 2, 2023 | Code GenerationHumanEval | CodeCode Available | 3 | 5 |
| KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding | Mar 4, 2025 | HumanEvalmbpp | CodeCode Available | 3 | 5 |
| DataDecide: How to Predict Best Pretraining Data with Small Experiments | Apr 15, 2025 | ARCHellaSwag | CodeCode Available | 3 | 5 |
| Evaluating Large Language Models Trained on Code | Jul 7, 2021 | Code GenerationHumanEval | CodeCode Available | 3 | 5 |
| Automatic Instruction Evolving for Large Language Models | Jun 2, 2024 | GSM8KHumanEval | CodeCode Available | 3 | 5 |