SOTAVerified

Code Completion

Papers

Showing 2650 of 212 papers

TitleStatusHype
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software DevelopmentCode2
Building A Coding Assistant via the Retrieval-Augmented Language ModelCode1
A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source CodeCode1
How Effective Are Neural Networks for Fixing Security VulnerabilitiesCode1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'Code1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation ModelsCode1
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language ModelsCode1
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMsCode1
Long Code Arena: a Set of Benchmarks for Long-Context Code ModelsCode1
MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation LearningCode1
MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with TransformersCode1
Learning Deep Semantics for Test CompletionCode1
CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming SequencesCode1
LLMSecEval: A Dataset of Natural Language Prompts for Security EvaluationsCode1
Empirical Study of Transformers for Source CodeCode1
Adversarial Robustness for CodeCode1
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language ModelsCode1
Energy-Based Models for Code Generation under Compilability ConstraintsCode1
Language Models for Code Completion: A Practical EvaluationCode1
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language ModelsCode1
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle TasksCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file ContextCode1
Show:102550
← PrevPage 2 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1deepseek-coder-33b-baseAverage69.01Unverified
2deepseek-coder-6.7b-baseAverage63.4Unverified
3starcoderbaseAverage55.54Unverified
4gpt-4-1106-previewAverage53.28Unverified
5CodeLlama-13b-hfAverage52.78Unverified
6deepseek-coder-1.3b-baseAverage52.63Unverified
7CodeLlama-34b-hfAverage49.66Unverified
8CodeLlama-7b-hfAverage45Unverified
9gpt-3.5-turbo-0301Average40.86Unverified
10incoder-6BAverage33.79Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)77.13Unverified
2CodeT5+ 770MEM (line-level)37.9Unverified
3CodeT5+ 220MEM (line-level)35.17Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)75.11Unverified
2CodeT5+ 770MEM (line-level)44.86Unverified
3CodeT5+ 220MEM (line-level)43.42Unverified
#ModelMetricClaimedVerifiedStatus
1SantaCoder-MGDCompilation Rate73.03Unverified
2SantaCoderCompilation Rate59.97Unverified
3SantaCoderCompilation Rate59.79Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate76.47Unverified
2RepoCoderCompilation Rate74.02Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate61.7Unverified
2RepoCoderCompilation Rate58.09Unverified