| MapCoder: Multi-Agent Code Generation for Competitive Problem Solving | May 18, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| any4: Learned 4-bit Numeric Representation for LLMs | Jul 7, 2025 | GPUGSM8K | CodeCode Available | 2 |
| Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | Oct 6, 2023 | Code GenerationDecision Making | CodeCode Available | 2 |
| MasRouter: Learning to Route LLMs for Multi-Agent Systems | Feb 16, 2025 | HumanEvalmbpp | CodeCode Available | 2 |
| Instruction Tuning With Loss Over Instructions | May 23, 2024 | HumanEvalMMLU | CodeCode Available | 1 |
| InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models | Mar 11, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct | Jul 8, 2024 | Code GenerationCode Summarization | CodeCode Available | 1 |
| HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization | Feb 26, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking | May 20, 2025 | HumanEvalmbpp | CodeCode Available | 1 |
| HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation | Dec 30, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |