SOTAVerified

HumanEval

Papers

Showing 201250 of 264 papers

TitleStatusHype
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness0
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided InterventionsCode1
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language ModelsCode0
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language ModelsCode1
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs0
CRUXEval: A Benchmark for Code Reasoning, Understanding and ExecutionCode4
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program RepairCode1
Instruction Fusion: Advancing Prompt Evolution through HybridizationCode0
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and OptimisationCode2
A Review of Repository Level Prompting for LLMs0
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data0
Magicoder: Empowering Code Generation with OSS-InstructCode4
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion0
Rethinking Benchmark and Contamination for Language Models with Rephrased SamplesCode2
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code GenerationCode0
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation0
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modulesCode1
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model0
The Program Testing Ability of Large Language Models for Code0
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsCode2
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent CollaborationCode1
Enhancing Large Language Models in Coding Through Multi-Perspective Self-ConsistencyCode0
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language ModelsCode0
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression0
Baichuan 2: Open Large-scale Language ModelsCode4
Can Programming Languages Boost Each Other via Instruction Tuning?Code0
Code Llama: Open Foundation Models for CodeCode6
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation0
OctoPack: Instruction Tuning Code Large Language ModelsCode3
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code GenerationCode1
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback0
Predicting Code Coverage without ExecutionCode1
Textbooks Are All You Need0
Is Self-Repair a Silver Bullet for Code Generation?Code1
WizardCoder: Empowering Code Large Language Models with Evol-InstructCode5
Large Language Models of Code Fail at Completing Code with Potential BugsCode0
SelfEvolve: A Code Evolution Framework via Large Language Models0
ANPL: Towards Natural Programming with Interactive DecompositionCode1
LeTI: Learning to Generate from Textual InteractionsCode1
CodeT5+: Open Code Large Language Models for Code Understanding and GenerationCode0
Structured Chain-of-Thought Prompting for Code Generation0
StarCoder: may the source be with you!Code5
Self-Edit: Fault-Aware Code Editor for Code GenerationCode0
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationCode3
Using Large Language Models to Generate JUnit Tests: An Empirical StudyCode0
Stochastic Code Generation0
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-XCode5
Reflexion: Language Agents with Verbal Reinforcement LearningCode4
Show:102550
← PrevPage 5 of 6Next →

No leaderboard results yet.