| any4: Learned 4-bit Numeric Representation for LLMs | Jul 7, 2025 | GPUGSM8K | CodeCode Available | 2 | 5 |
| CodeT: Code Generation with Generated Tests | Jul 21, 2022 | Code GenerationHumanEval | CodeCode Available | 2 | 5 |
| Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | Oct 6, 2023 | Code GenerationDecision Making | CodeCode Available | 2 | 5 |
| MapCoder: Multi-Agent Code Generation for Competitive Problem Solving | May 18, 2024 | Code GenerationHumanEval | CodeCode Available | 2 | 5 |
| Instruction Tuning With Loss Over Instructions | May 23, 2024 | HumanEvalMMLU | CodeCode Available | 1 | 5 |
| InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models | Mar 11, 2024 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |
| InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct | Jul 8, 2024 | Code GenerationCode Summarization | CodeCode Available | 1 | 5 |
| Fault-Aware Neural Code Rankers | Jun 4, 2022 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |
| Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking | May 20, 2025 | HumanEvalmbpp | CodeCode Available | 1 | 5 |
| HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks | Oct 16, 2024 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |