HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 264 papers

Title	Date	Tasks	Status	Hype
Better & Faster Large Language Models via Multi-token Prediction	Apr 30, 2024	HumanEvalmbpp	CodeCode Available	1
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts	Apr 23, 2024	HumanEvalmbpp	CodeCode Available	1
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers	Apr 3, 2024	HumanEval	CodeCode Available	1
Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization	Apr 2, 2024	Code GenerationHumanEval	CodeCode Available	1
CYCLE: Learning to Self-Refine the Code Generation	Mar 27, 2024	Code GenerationHumanEval	CodeCode Available	1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models	Mar 11, 2024	Code GenerationHumanEval	CodeCode Available	1
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization	Feb 26, 2024	Code GenerationHumanEval	CodeCode Available	1
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models	Feb 24, 2024	HumanEvalMemorization	CodeCode Available	1
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding	Feb 19, 2024	HumanEvalLanguage Modeling	CodeCode Available	1
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning	Feb 14, 2024	Code GenerationHumanEval	CodeCode Available	1
Unsupervised Evaluation of Code LLMs with Round-Trip Correctness	Feb 13, 2024	HumanEvalmbpp	CodeCode Available	1
Getting the most out of your tokenizer for pre-training and domain adaptation	Feb 1, 2024	Code GenerationDomain Adaptation	CodeCode Available	1
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions	Jan 17, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models	Jan 12, 2024	Code GenerationHumanEval	CodeCode Available	1
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair	Dec 25, 2023	HumanEvalparameter-efficient fine-tuning	CodeCode Available	1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion	Oct 17, 2023	Code CompletionHumanEval	CodeCode Available	1
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules	Oct 13, 2023	Code GenerationHumanEval	CodeCode Available	1
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration	Oct 3, 2023	Arithmetic ReasoningCode Generation	CodeCode Available	1
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation	Aug 3, 2023	Class-level Code GenerationCode Generation	CodeCode Available	1
Predicting Code Coverage without Execution	Jul 25, 2023	HumanEval	CodeCode Available	1
Is Self-Repair a Silver Bullet for Code Generation?	Jun 16, 2023	Code GenerationHumanEval	CodeCode Available	1
ANPL: Towards Natural Programming with Interactive Decomposition	May 29, 2023	ARCCode Generation	CodeCode Available	1
LeTI: Learning to Generate from Textual Interactions	May 17, 2023	Code GenerationEvent Argument Extraction	CodeCode Available	1
ReCode: Robustness Evaluation of Code Generation Models	Dec 20, 2022	Code GenerationHumanEval	CodeCode Available	1
Multi-lingual Evaluation of Code Generation Models	Oct 26, 2022	Code CompletionCode Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 4 of 11Next →

No leaderboard results yet.