HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 264 papers

Title	Date	Tasks	Status	Hype
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness	Jan 29, 2024	HumanEval	—Unverified	0
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions	Jan 17, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models	Jan 15, 2024	HumanEvalLanguage Modelling	CodeCode Available	0
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models	Jan 12, 2024	Code GenerationHumanEval	CodeCode Available	1
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs	Jan 11, 2024	Code GenerationHumanEval	—Unverified	0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs	Jan 8, 2024	Code GenerationDiversity	—Unverified	0
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution	Jan 5, 2024	HumanEvalPrediction	CodeCode Available	4
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair	Dec 25, 2023	HumanEvalparameter-efficient fine-tuning	CodeCode Available	1
Instruction Fusion: Advancing Prompt Evolution through Hybridization	Dec 25, 2023	Code GenerationHumanEval	CodeCode Available	0
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation	Dec 20, 2023	Code GenerationHumanEval	CodeCode Available	2
A Review of Repository Level Prompting for LLMs	Dec 15, 2023	Code CompletionCode Generation	—Unverified	0
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data	Dec 5, 2023	Code GenerationHumanEval	—Unverified	0
Magicoder: Empowering Code Generation with OSS-Instruct	Dec 4, 2023	Code GenerationHumanEval	CodeCode Available	4
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion	Nov 13, 2023	Code CompletionHumanEval	—Unverified	0
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples	Nov 8, 2023	HumanEvalMMLU	CodeCode Available	2
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation	Oct 28, 2023	Code GenerationHumanEval	CodeCode Available	0
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion	Oct 17, 2023	Code CompletionHumanEval	CodeCode Available	1
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation	Oct 16, 2023	Code GenerationHumanEval	—Unverified	0
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules	Oct 13, 2023	Code GenerationHumanEval	CodeCode Available	1
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model	Oct 10, 2023	Code GenerationCode Translation	—Unverified	0
The Program Testing Ability of Large Language Models for Code	Oct 9, 2023	HumanEvalmbpp	—Unverified	0
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models	Oct 6, 2023	Code GenerationDecision Making	CodeCode Available	2
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration	Oct 3, 2023	Arithmetic ReasoningCode Generation	CodeCode Available	1
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency	Sep 29, 2023	Code GenerationHumanEval	CodeCode Available	0
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models	Sep 27, 2023	HumanEvalLanguage Modeling	CodeCode Available	0
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression	Sep 25, 2023	Code GenerationHumanEval	—Unverified	0
Baichuan 2: Open Large-scale Language Models	Sep 19, 2023	Feature EngineeringGSM8K	CodeCode Available	4
Can Programming Languages Boost Each Other via Instruction Tuning?	Aug 31, 2023	HumanEval	CodeCode Available	0
Code Llama: Open Foundation Models for Code	Aug 24, 2023	16kCode Generation	CodeCode Available	6
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation	Aug 17, 2023	Code GenerationFew-Shot Learning	—Unverified	0
OctoPack: Instruction Tuning Code Large Language Models	Aug 14, 2023	Code GenerationCode Repair	CodeCode Available	3
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation	Aug 3, 2023	Class-level Code GenerationCode Generation	CodeCode Available	1
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback	Jul 27, 2023	Code GenerationHumanEval	—Unverified	0
Predicting Code Coverage without Execution	Jul 25, 2023	HumanEval	CodeCode Available	1
Textbooks Are All You Need	Jun 20, 2023	AllCode Generation	—Unverified	0
Is Self-Repair a Silver Bullet for Code Generation?	Jun 16, 2023	Code GenerationHumanEval	CodeCode Available	1
WizardCoder: Empowering Code Large Language Models with Evol-Instruct	Jun 14, 2023	Code GenerationHumanEval	CodeCode Available	5
Large Language Models of Code Fail at Completing Code with Potential Bugs	Jun 6, 2023	Code CompletionHumanEval	CodeCode Available	0
SelfEvolve: A Code Evolution Framework via Large Language Models	Jun 5, 2023	Code GenerationHumanEval	—Unverified	0
ANPL: Towards Natural Programming with Interactive Decomposition	May 29, 2023	ARCCode Generation	CodeCode Available	1
LeTI: Learning to Generate from Textual Interactions	May 17, 2023	Code GenerationEvent Argument Extraction	CodeCode Available	1
CodeT5+: Open Code Large Language Models for Code Understanding and Generation	May 13, 2023	Arithmetic ReasoningCode Completion	CodeCode Available	0
Structured Chain-of-Thought Prompting for Code Generation	May 11, 2023	Code GenerationHumanEval	—Unverified	0
StarCoder: may the source be with you!	May 9, 2023	8kCode Generation	CodeCode Available	5
Self-Edit: Fault-Aware Code Editor for Code Generation	May 6, 2023	Code GenerationHumanEval	CodeCode Available	0
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation	May 2, 2023	Code GenerationHumanEval	CodeCode Available	3
Using Large Language Models to Generate JUnit Tests: An Empirical Study	Apr 30, 2023	Code GenerationHumanEval	CodeCode Available	0
Stochastic Code Generation	Apr 14, 2023	Code GenerationDecoder	—Unverified	0
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X	Mar 30, 2023	BenchmarkingCode Generation	CodeCode Available	5
Reflexion: Language Agents with Verbal Reinforcement Learning	Mar 20, 2023	Decision MakingHumanEval	CodeCode Available	4

Show:10 25 50

← PrevPage 5 of 6Next →

No leaderboard results yet.