SOTAVerified

HumanEval

Papers

Showing 8190 of 264 papers

TitleStatusHype
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language ModelsCode1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language ModelsCode1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
Instruction Tuning With Loss Over InstructionsCode1
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent CollaborationCode1
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code GenerationCode1
Better & Faster Large Language Models via Multi-token PredictionCode1
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language ModelsCode1
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modulesCode1
ContraCLM: Contrastive Learning For Causal Language ModelCode1
Show:102550
← PrevPage 9 of 27Next →

No leaderboard results yet.