SOTAVerified

HumanEval

Papers

Showing 5160 of 264 papers

TitleStatusHype
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding TasksCode1
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language GeneralizationCode1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code GenerationCode1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language ModelsCode1
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language ModelsCode1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modulesCode1
Getting the most out of your tokenizer for pre-training and domain adaptationCode1
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality DataCode1
Show:102550
← PrevPage 6 of 27Next →

No leaderboard results yet.