SOTAVerified|Agents Browse Leaderboard About Blog

Code Completion

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 212 papers

Title	Date	Tasks	Status	Hype
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards	Jul 4, 2024	Code Completion	CodeCode Available	3
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models	Mar 26, 2024	Code CompletionFew-Shot Learning	CodeCode Available	3
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding	Aug 28, 2023	16kCode Completion	CodeCode Available	3
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development	May 22, 2025	Bug fixingChatbot	CodeCode Available	2
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification	Feb 24, 2025	Code Completion	CodeCode Available	2
CursorCore: Assist Programming through Aligning Anything	Oct 9, 2024	Code Completion	CodeCode Available	2
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?	Oct 2, 2024	Code CompletionCode Generation	CodeCode Available	2
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection	Jun 10, 2024	Backdoor AttackCode Completion	CodeCode Available	2
Optimizing Large Language Models for OpenAPI Code Completion	May 24, 2024	Code CompletionCode Generation	CodeCode Available	2
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion	Mar 12, 2024	Code CompletionSafety Alignment	CodeCode Available	2

Show:10 25 50

← PrevPage 2 of 22Next →

All datasets SAFIM CodeXGLUE - Github Java Corpus CodeXGLUE - PY150 DotPrompts Defects4J Rambo Benchmark

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	deepseek-coder-33b-base	Average	69.01	—	Unverified
2	deepseek-coder-6.7b-base	Average	63.4	—	Unverified
3	starcoderbase	Average	55.54	—	Unverified
4	gpt-4-1106-preview	Average	53.28	—	Unverified
5	CodeLlama-13b-hf	Average	52.78	—	Unverified
6	deepseek-coder-1.3b-base	Average	52.63	—	Unverified
7	CodeLlama-34b-hf	Average	49.66	—	Unverified
8	CodeLlama-7b-hf	Average	45	—	Unverified
9	gpt-3.5-turbo-0301	Average	40.86	—	Unverified
10	incoder-6B	Average	33.79	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CodeGPT-adapted	Accuracy (token-level)	77.13	—	Unverified
2	CodeT5+ 770M	EM (line-level)	37.9	—	Unverified
3	CodeT5+ 220M	EM (line-level)	35.17	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CodeGPT-adapted	Accuracy (token-level)	75.11	—	Unverified
2	CodeT5+ 770M	EM (line-level)	44.86	—	Unverified
3	CodeT5+ 220M	EM (line-level)	43.42	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SantaCoder-MGD	Compilation Rate	73.03	—	Unverified
2	SantaCoder	Compilation Rate	59.97	—	Unverified
3	SantaCoder	Compilation Rate	59.79	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Rambo	Compilation Rate	76.47	—	Unverified
2	RepoCoder	Compilation Rate	74.02	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Rambo	Compilation Rate	61.7	—	Unverified
2	RepoCoder	Compilation Rate	58.09	—	Unverified