SOTAVerified

Code Completion

Papers

Showing 150 of 212 papers

TitleStatusHype
Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot FrameworkCode9
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code ProcessingCode7
StarCoder 2 and The Stack v2: The Next GenerationCode7
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt CompressionCode5
Break the Sequential Dependency of LLM Inference Using Lookahead DecodingCode5
Seed-Coder: Let the Code Model Curate Data for ItselfCode4
Scaling Granite Code Models to 128K ContextCode4
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt InjectionCode4
AutoCoder: Enhancing Code Large Language Model with AIEV-InstructCode4
LongBench: A Bilingual, Multitask Benchmark for Long Context UnderstandingCode3
Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code GenerationCode3
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model LeaderboardsCode3
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
Guiding Language Models of Code with Global Context using MonitorsCode2
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and VerificationCode2
Optimizing Large Language Models for OpenAPI Code CompletionCode2
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
LangBridge: Multilingual Reasoning Without Multilingual SupervisionCode2
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code CompletionCode2
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong DetectionCode2
CursorCore: Assist Programming through Aligning AnythingCode2
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler FeedbackCode2
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and GenerationCode2
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?Code2
RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code CompletionCode2
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software DevelopmentCode2
Building A Coding Assistant via the Retrieval-Augmented Language ModelCode1
A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source CodeCode1
How Effective Are Neural Networks for Fixing Security VulnerabilitiesCode1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'Code1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation ModelsCode1
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language ModelsCode1
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMsCode1
Long Code Arena: a Set of Benchmarks for Long-Context Code ModelsCode1
MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation LearningCode1
MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with TransformersCode1
Learning Deep Semantics for Test CompletionCode1
CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming SequencesCode1
LLMSecEval: A Dataset of Natural Language Prompts for Security EvaluationsCode1
Empirical Study of Transformers for Source CodeCode1
Adversarial Robustness for CodeCode1
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language ModelsCode1
Energy-Based Models for Code Generation under Compilability ConstraintsCode1
Language Models for Code Completion: A Practical EvaluationCode1
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language ModelsCode1
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle TasksCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file ContextCode1
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1deepseek-coder-33b-baseAverage69.01Unverified
2deepseek-coder-6.7b-baseAverage63.4Unverified
3starcoderbaseAverage55.54Unverified
4gpt-4-1106-previewAverage53.28Unverified
5CodeLlama-13b-hfAverage52.78Unverified
6deepseek-coder-1.3b-baseAverage52.63Unverified
7CodeLlama-34b-hfAverage49.66Unverified
8CodeLlama-7b-hfAverage45Unverified
9gpt-3.5-turbo-0301Average40.86Unverified
10incoder-6BAverage33.79Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)77.13Unverified
2CodeT5+ 770MEM (line-level)37.9Unverified
3CodeT5+ 220MEM (line-level)35.17Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)75.11Unverified
2CodeT5+ 770MEM (line-level)44.86Unverified
3CodeT5+ 220MEM (line-level)43.42Unverified
#ModelMetricClaimedVerifiedStatus
1SantaCoder-MGDCompilation Rate73.03Unverified
2SantaCoderCompilation Rate59.97Unverified
3SantaCoderCompilation Rate59.79Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate76.47Unverified
2RepoCoderCompilation Rate74.02Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate61.7Unverified
2RepoCoderCompilation Rate58.09Unverified