SOTAVerified

Code Completion

Papers

Showing 150 of 212 papers

TitleStatusHype
Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot FrameworkCode9
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code ProcessingCode7
StarCoder 2 and The Stack v2: The Next GenerationCode7
Break the Sequential Dependency of LLM Inference Using Lookahead DecodingCode5
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt CompressionCode5
Seed-Coder: Let the Code Model Curate Data for ItselfCode4
Scaling Granite Code Models to 128K ContextCode4
AutoCoder: Enhancing Code Large Language Model with AIEV-InstructCode4
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt InjectionCode4
Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code GenerationCode3
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model LeaderboardsCode3
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
LongBench: A Bilingual, Multitask Benchmark for Long Context UnderstandingCode3
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software DevelopmentCode2
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and VerificationCode2
CursorCore: Assist Programming through Aligning AnythingCode2
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?Code2
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong DetectionCode2
Optimizing Large Language Models for OpenAPI Code CompletionCode2
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code CompletionCode2
RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code CompletionCode2
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler FeedbackCode2
LangBridge: Multilingual Reasoning Without Multilingual SupervisionCode2
Guiding Language Models of Code with Global Context using MonitorsCode2
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and GenerationCode2
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation ModelsCode1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'Code1
Building A Coding Assistant via the Retrieval-Augmented Language ModelCode1
RAMBO: Enhancing RAG-based Repository-Level Method Body CompletionCode1
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective PartitioningCode1
Security Attacks on LLM-based Code Completion ToolsCode1
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMsCode1
Long Code Arena: a Set of Benchmarks for Long-Context Code ModelsCode1
VersiCode: Towards Version-controllable Code GenerationCode1
SemCoder: Training Code Language Models with Comprehensive Semantics ReasoningCode1
Dataflow-Guided Retrieval Augmentation for Repository-Level Code CompletionCode1
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference PassCode1
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle TasksCode1
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code GeneratorsCode1
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language ModelsCode1
Language Models for Code Completion: A Practical EvaluationCode1
Can Large Language Models Write Parallel Code?Code1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
Ada-Instruct: Adapting Instruction Generators for Complex ReasoningCode1
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language ModelsCode1
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language ModelsCode1
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1deepseek-coder-33b-baseAverage69.01Unverified
2deepseek-coder-6.7b-baseAverage63.4Unverified
3starcoderbaseAverage55.54Unverified
4gpt-4-1106-previewAverage53.28Unverified
5CodeLlama-13b-hfAverage52.78Unverified
6deepseek-coder-1.3b-baseAverage52.63Unverified
7CodeLlama-34b-hfAverage49.66Unverified
8CodeLlama-7b-hfAverage45Unverified
9gpt-3.5-turbo-0301Average40.86Unverified
10incoder-6BAverage33.79Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)77.13Unverified
2CodeT5+ 770MEM (line-level)37.9Unverified
3CodeT5+ 220MEM (line-level)35.17Unverified
#ModelMetricClaimedVerifiedStatus
1CodeGPT-adaptedAccuracy (token-level)75.11Unverified
2CodeT5+ 770MEM (line-level)44.86Unverified
3CodeT5+ 220MEM (line-level)43.42Unverified
#ModelMetricClaimedVerifiedStatus
1SantaCoder-MGDCompilation Rate73.03Unverified
2SantaCoderCompilation Rate59.97Unverified
3SantaCoderCompilation Rate59.79Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate76.47Unverified
2RepoCoderCompilation Rate74.02Unverified
#ModelMetricClaimedVerifiedStatus
1RamboCompilation Rate61.7Unverified
2RepoCoderCompilation Rate58.09Unverified