SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 264 papers

Title	Date	Tasks	Status	Hype
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Oct 19, 2024	Code GenerationDiversity	CodeCode Available	0
CELI: Controller-Embedded Language Model Interactions	Oct 18, 2024	ArticlesCode Generation	—Unverified	0
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks	Oct 15, 2024	HumanEvalLanguage Modelling	—Unverified	0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0
KV Prediction for Improved Time to First Token	Oct 10, 2024	Code CompletionCPU	—Unverified	0
Context-Augmented Code Generation Using Programming Knowledge Graphs	Oct 9, 2024	Code GenerationHumanEval	—Unverified	0
AIME: AI System Optimization via Multiple LLM Evaluators	Oct 4, 2024	Code GenerationHumanEval	—Unverified	0
RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance	Oct 2, 2024	Code GenerationHumanEval	CodeCode Available	0
AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation	Oct 1, 2024	Code GenerationHumanEval	CodeCode Available	0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity	Sep 24, 2024	Code GenerationContrastive Learning	—Unverified	0
GRIN: GRadient-INformed MoE	Sep 18, 2024	HellaSwagHumanEval	—Unverified	0
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation	Sep 15, 2024	Code GenerationHumanEval	—Unverified	0
Measuring the Influence of Incorrect Code on Test Generation	Sep 14, 2024	HumanEvalLarge Language Model	CodeCode Available	0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks	Sep 13, 2024	ARCCode Generation	—Unverified	0
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding	Sep 9, 2024	Code GenerationHumanEval	—Unverified	0
Multi-Programming Language Ensemble for Code Generation in Large Language Model	Sep 6, 2024	Code GenerationHumanEval	CodeCode Available	0
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining	Sep 3, 2024	Code GenerationHumanEval	—Unverified	0
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution	Aug 23, 2024	Code GenerationHumanEval	—Unverified	0
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation	Aug 23, 2024	Code GenerationHumanEval	—Unverified	0
AutoTest: Evolutionary Code Solution Selection with Test Cases	Aug 22, 2024	Code GenerationHumanEval	—Unverified	0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs	Aug 18, 2024	DiversityGPU	—Unverified	0
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting	Aug 18, 2024	HumanEvalMathematical Reasoning	—Unverified	0
CodeMirage: Hallucinations in Code Generated by Large Language Models	Aug 14, 2024	Code GenerationHallucination	—Unverified	0
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding	Aug 8, 2024	HumanEvalRetrieval	—Unverified	0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models	Jul 30, 2024	BenchmarkingCode Completion	—Unverified	0

Show:10 25 50

← PrevPage 8 of 11Next →

No leaderboard results yet.