SOTAVerified

HumanEval

Papers

Showing 201210 of 264 papers

TitleStatusHype
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness0
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided InterventionsCode1
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language ModelsCode0
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language ModelsCode1
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs0
CRUXEval: A Benchmark for Code Reasoning, Understanding and ExecutionCode4
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program RepairCode1
Instruction Fusion: Advancing Prompt Evolution through HybridizationCode0
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and OptimisationCode2
Show:102550
← PrevPage 21 of 27Next →

No leaderboard results yet.