| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation | May 19, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| MapCoder: Multi-Agent Code Generation for Competitive Problem Solving | May 18, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 |
| NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | May 7, 2024 | HumanEvalmbpp | CodeCode Available | 2 |
| Better & Faster Large Language Models via Multi-token Prediction | Apr 30, 2024 | HumanEvalmbpp | CodeCode Available | 1 |
| On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation | Apr 26, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding | Apr 25, 2024 | GSM8KHellaSwag | CodeCode Available | 3 |
| BASS: Batched Attention-optimized Speculative Sampling | Apr 24, 2024 | GPUHumanEval | —Unverified | 0 |
| XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts | Apr 23, 2024 | HumanEvalmbpp | CodeCode Available | 1 |