HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 264 papers

Title	Date	Tasks	Status
Discrete Flow Matching	Jul 22, 2024	HumanEvalmbpp	—Unverified
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants	Jul 12, 2024	HumanEval	—Unverified
Brevity is the soul of wit: Pruning long files for code generation	Jun 29, 2024	Code GenerationHumanEval	—Unverified
Towards Large Language Model Aided Program Refinement	Jun 26, 2024	HumanEvalLanguage Modeling	—Unverified
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models	Jun 20, 2024	Code GenerationHumanEval	—Unverified
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency	Jun 18, 2024	HumanEvalmbpp	—Unverified
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified
Validating LLM-Generated Programs with Metamorphic Prompt Testing	Jun 11, 2024	HumanEval	—Unverified
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases	Jun 11, 2024	Code GenerationHumanEval	—Unverified
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths	May 30, 2024	GSM8KHumanEval	—Unverified
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation	May 30, 2024	Code GenerationHumanEval	—Unverified
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code	May 29, 2024	HumanEval	—Unverified
Kotlin ML Pack: Technical Report	May 29, 2024	Code GenerationHumanEval	—Unverified
Can Github issues be solved with Tree Of Thoughts?	May 20, 2024	Code GenerationGitHub issue resolution	CodeCode Available
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation	Apr 26, 2024	Code GenerationHumanEval	—Unverified
BASS: Batched Attention-optimized Speculative Sampling	Apr 24, 2024	GPUHumanEval	—Unverified
NExT: Teaching Large Language Models to Reason about Code Execution	Apr 23, 2024	HumanEvalmbpp	—Unverified
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation	Apr 17, 2024	Code GenerationHumanEval	—Unverified
Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective	Apr 11, 2024	Code GenerationHumanEval	CodeCode Available
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation	Apr 1, 2024	Code GenerationHallucination	—Unverified
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?	Mar 25, 2024	HumanEval	—Unverified
CodeShell Technical Report	Mar 23, 2024	8kHumanEval	—Unverified
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents	Mar 23, 2024	Code GenerationHumanEval	—Unverified
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study	Mar 22, 2024	Code CompletionHumanEval	CodeCode Available
Software Vulnerability and Functionality Assessment using LLMs	Mar 13, 2024	Code GenerationHumanEval	—Unverified
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge	Mar 13, 2024	Dialogue EvaluationHumanEval	—Unverified
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code	Mar 12, 2024	Code GenerationHumanEval	—Unverified
Test-Driven Development for Code Generation	Feb 21, 2024	Code GenerationHumanEval	—Unverified
HumanEval on Latest GPT Models -- 2024	Feb 20, 2024	Code GenerationHumanEval	CodeCode Available
Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models	Feb 13, 2024	Code GenerationHumanEval	—Unverified
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness	Jan 29, 2024	HumanEval	—Unverified
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models	Jan 15, 2024	HumanEvalLanguage Modelling	CodeCode Available
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs	Jan 11, 2024	Code GenerationHumanEval	—Unverified
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs	Jan 8, 2024	Code GenerationDiversity	—Unverified
Instruction Fusion: Advancing Prompt Evolution through Hybridization	Dec 25, 2023	Code GenerationHumanEval	CodeCode Available
A Review of Repository Level Prompting for LLMs	Dec 15, 2023	Code CompletionCode Generation	—Unverified
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data	Dec 5, 2023	Code GenerationHumanEval	—Unverified
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion	Nov 13, 2023	Code CompletionHumanEval	—Unverified
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation	Oct 28, 2023	Code GenerationHumanEval	CodeCode Available
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation	Oct 16, 2023	Code GenerationHumanEval	—Unverified
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model	Oct 10, 2023	Code GenerationCode Translation	—Unverified
The Program Testing Ability of Large Language Models for Code	Oct 9, 2023	HumanEvalmbpp	—Unverified
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency	Sep 29, 2023	Code GenerationHumanEval	CodeCode Available
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models	Sep 27, 2023	HumanEvalLanguage Modeling	CodeCode Available
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression	Sep 25, 2023	Code GenerationHumanEval	—Unverified
Can Programming Languages Boost Each Other via Instruction Tuning?	Aug 31, 2023	HumanEval	CodeCode Available
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation	Aug 17, 2023	Code GenerationFew-Shot Learning	—Unverified

Show:10 25 50

← PrevPage 5 of 6Next →

No leaderboard results yet.