SOTAVerified

Code Completion

Papers

Showing 5175 of 212 papers

TitleStatusHype
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMsCode1
RAMBO: Enhancing RAG-based Repository-Level Method Body CompletionCode1
LLMSecEval: A Dataset of Natural Language Prompts for Security EvaluationsCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted ProgrammingCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
Execution-based Code Generation using Deep Reinforcement LearningCode1
A Syntax-Guided Edit Decoder for Neural Program RepairCode1
Building A Coding Assistant via the Retrieval-Augmented Language ModelCode1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'Code1
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle TasksCode1
Ada-Instruct: Adapting Instruction Generators for Complex ReasoningCode1
How Effective Are Neural Networks for Fixing Security VulnerabilitiesCode1
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective PartitioningCode1
Dataflow-Guided Retrieval Augmentation for Repository-Level Code CompletionCode1
Can Large Language Models Write Parallel Code?Code1
Language Models for Code Completion: A Practical EvaluationCode1
RepoBench: Benchmarking Repository-Level Code Auto-Completion SystemsCode1
Multi-lingual Evaluation of Code Generation ModelsCode1
SemCoder: Training Code Language Models with Comprehensive Semantics ReasoningCode1
Empirical Study of Transformers for Source CodeCode1
MERGE: Fast Private Text GenerationCode0
Breaking the Silence: the Threats of Using LLMs in Software EngineeringCode0
LongCoder: A Long-Range Pre-trained Language Model for Code CompletionCode0
Show:102550
← PrevPage 3 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1deepseek-coder-33b-baseAverage69.01Unverified
2deepseek-coder-6.7b-baseAverage63.4Unverified
3starcoderbaseAverage55.54Unverified
4gpt-4-1106-previewAverage53.28Unverified
5CodeLlama-13b-hfAverage52.78Unverified
6deepseek-coder-1.3b-baseAverage52.63Unverified
7CodeLlama-34b-hfAverage49.66Unverified
8CodeLlama-7b-hfAverage45Unverified
9gpt-3.5-turbo-0301Average40.86Unverified
10incoder-6BAverage33.79Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)77.13Unverified
2CodeT5+ 770MEM (line-level)37.9Unverified
3CodeT5+ 220MEM (line-level)35.17Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)75.11Unverified
2CodeT5+ 770MEM (line-level)44.86Unverified
3CodeT5+ 220MEM (line-level)43.42Unverified
#ModelMetricClaimedVerifiedStatus
1SantaCoder-MGDCompilation Rate73.03Unverified
2SantaCoderCompilation Rate59.97Unverified
3SantaCoderCompilation Rate59.79Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate76.47Unverified
2RepoCoderCompilation Rate74.02Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate61.7Unverified
2RepoCoderCompilation Rate58.09Unverified