HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 264 papers

Title	Date	Tasks	Status	Hype
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness	Jan 29, 2024	HumanEval	—Unverified	0
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions	Jan 17, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models	Jan 15, 2024	HumanEvalLanguage Modelling	CodeCode Available	0
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models	Jan 12, 2024	Code GenerationHumanEval	CodeCode Available	1
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs	Jan 11, 2024	Code GenerationHumanEval	—Unverified	0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs	Jan 8, 2024	Code GenerationDiversity	—Unverified	0
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution	Jan 5, 2024	HumanEvalPrediction	CodeCode Available	4
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair	Dec 25, 2023	HumanEvalparameter-efficient fine-tuning	CodeCode Available	1
Instruction Fusion: Advancing Prompt Evolution through Hybridization	Dec 25, 2023	Code GenerationHumanEval	CodeCode Available	0
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation	Dec 20, 2023	Code GenerationHumanEval	CodeCode Available	2
A Review of Repository Level Prompting for LLMs	Dec 15, 2023	Code CompletionCode Generation	—Unverified	0
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data	Dec 5, 2023	Code GenerationHumanEval	—Unverified	0
Magicoder: Empowering Code Generation with OSS-Instruct	Dec 4, 2023	Code GenerationHumanEval	CodeCode Available	4
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion	Nov 13, 2023	Code CompletionHumanEval	—Unverified	0
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples	Nov 8, 2023	HumanEvalMMLU	CodeCode Available	2
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation	Oct 28, 2023	Code GenerationHumanEval	CodeCode Available	0
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion	Oct 17, 2023	Code CompletionHumanEval	CodeCode Available	1
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation	Oct 16, 2023	Code GenerationHumanEval	—Unverified	0
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules	Oct 13, 2023	Code GenerationHumanEval	CodeCode Available	1
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model	Oct 10, 2023	Code GenerationCode Translation	—Unverified	0
The Program Testing Ability of Large Language Models for Code	Oct 9, 2023	HumanEvalmbpp	—Unverified	0
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models	Oct 6, 2023	Code GenerationDecision Making	CodeCode Available	2
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration	Oct 3, 2023	Arithmetic ReasoningCode Generation	CodeCode Available	1
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency	Sep 29, 2023	Code GenerationHumanEval	CodeCode Available	0
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models	Sep 27, 2023	HumanEvalLanguage Modeling	CodeCode Available	0

Show:10 25 50

← PrevPage 9 of 11Next →

No leaderboard results yet.