SOTAVerified

HumanEval

Papers

Showing 201225 of 264 papers

TitleStatusHype
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness0
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided InterventionsCode1
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language ModelsCode0
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language ModelsCode1
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs0
CRUXEval: A Benchmark for Code Reasoning, Understanding and ExecutionCode4
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program RepairCode1
Instruction Fusion: Advancing Prompt Evolution through HybridizationCode0
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and OptimisationCode2
A Review of Repository Level Prompting for LLMs0
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data0
Magicoder: Empowering Code Generation with OSS-InstructCode4
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion0
Rethinking Benchmark and Contamination for Language Models with Rephrased SamplesCode2
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code GenerationCode0
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation0
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modulesCode1
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model0
The Program Testing Ability of Large Language Models for Code0
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsCode2
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent CollaborationCode1
Enhancing Large Language Models in Coding Through Multi-Perspective Self-ConsistencyCode0
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language ModelsCode0
Show:102550
← PrevPage 9 of 11Next →

No leaderboard results yet.