HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 264 papers

Title	Date	Tasks	Status	Hype
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0
KV Prediction for Improved Time to First Token	Oct 10, 2024	Code CompletionCPU	—Unverified	0
Context-Augmented Code Generation Using Programming Knowledge Graphs	Oct 9, 2024	Code GenerationHumanEval	—Unverified	0
AIME: AI System Optimization via Multiple LLM Evaluators	Oct 4, 2024	Code GenerationHumanEval	—Unverified	0
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis	Oct 3, 2024	HumanEvalSynthetic Data Generation	CodeCode Available	1
RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance	Oct 2, 2024	Code GenerationHumanEval	CodeCode Available	0
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging	Oct 2, 2024	Auto DebuggingBug fixing	CodeCode Available	2
AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation	Oct 1, 2024	Code GenerationHumanEval	CodeCode Available	0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity	Sep 24, 2024	Code GenerationContrastive Learning	—Unverified	0
Training Language Models to Self-Correct via Reinforcement Learning	Sep 19, 2024	HumanEvalMath	CodeCode Available	2
GRIN: GRadient-INformed MoE	Sep 18, 2024	HellaSwagHumanEval	—Unverified	0
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation	Sep 15, 2024	Code GenerationHumanEval	—Unverified	0
Measuring the Influence of Incorrect Code on Test Generation	Sep 14, 2024	HumanEvalLarge Language Model	CodeCode Available	0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks	Sep 13, 2024	ARCCode Generation	—Unverified	0
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation	Sep 11, 2024	Code GenerationHumanEval	CodeCode Available	1
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding	Sep 9, 2024	Code GenerationHumanEval	—Unverified	0
Multi-Programming Language Ensemble for Code Generation in Large Language Model	Sep 6, 2024	Code GenerationHumanEval	CodeCode Available	0
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data	Sep 5, 2024	Code GenerationDiversity	CodeCode Available	1
Planning In Natural Language Improves LLM Search For Code Generation	Sep 5, 2024	Code GenerationDiversity	CodeCode Available	1
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining	Sep 3, 2024	Code GenerationHumanEval	—Unverified	0
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation	Aug 23, 2024	Code GenerationHumanEval	—Unverified	0
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution	Aug 23, 2024	Code GenerationHumanEval	—Unverified	0
AutoTest: Evolutionary Code Solution Selection with Test Cases	Aug 22, 2024	Code GenerationHumanEval	—Unverified	0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs	Aug 18, 2024	DiversityGPU	—Unverified	0
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting	Aug 18, 2024	HumanEvalMathematical Reasoning	—Unverified	0
CodeMirage: Hallucinations in Code Generated by Large Language Models	Aug 14, 2024	Code GenerationHallucination	—Unverified	0
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding	Aug 8, 2024	HumanEvalRetrieval	—Unverified	0
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases	Aug 7, 2024	HumanEvalmbpp	CodeCode Available	7
ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models	Aug 2, 2024	Code GenerationHumanEval	CodeCode Available	1
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models	Jul 30, 2024	BenchmarkingCode Completion	—Unverified	0
Discrete Flow Matching	Jul 22, 2024	HumanEvalmbpp	—Unverified	0
Scaling Granite Code Models to 128K Context	Jul 18, 2024	2k4k	CodeCode Available	4
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants	Jul 12, 2024	HumanEval	—Unverified	0
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct	Jul 8, 2024	Code GenerationCode Summarization	CodeCode Available	1
Brevity is the soul of wit: Pruning long files for code generation	Jun 29, 2024	Code GenerationHumanEval	—Unverified	0
Towards Large Language Model Aided Program Refinement	Jun 26, 2024	HumanEvalLanguage Modeling	—Unverified	0
RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale	Jun 24, 2024	Code GenerationHumanEval	CodeCode Available	1
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models	Jun 20, 2024	Code GenerationHumanEval	—Unverified	0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency	Jun 18, 2024	HumanEvalmbpp	—Unverified	0
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified	0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases	Jun 11, 2024	Code GenerationHumanEval	—Unverified	0
Validating LLM-Generated Programs with Metamorphic Prompt Testing	Jun 11, 2024	HumanEval	—Unverified	0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark	Jun 10, 2024	HumanEvalProgram Synthesis	CodeCode Available	1
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified	0
SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning	Jun 3, 2024	Code CompletionCode Generation	CodeCode Available	1
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3

Show:10 25 50

← PrevPage 3 of 6Next →

No leaderboard results yet.