SOTAVerified|Agents Browse Leaderboard About Blog

Code Completion

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 212 papers

Title	Date	Tasks	Status	Hype
Beyond Autocomplete: Designing CopilotLens Towards Transparent and Explainable AI Coding Agents	Jun 24, 2025	Code CompletionDecision Making	—Unverified	0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models	Jun 23, 2025	Code CompletionGSM8K	—Unverified	0
Seed-Coder: Let the Code Model Curate Data for Itself	Jun 4, 2025	Code CompletionCode Generation	CodeCode Available	4
HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding	May 28, 2025	Code CompletionCode Generation	—Unverified	0
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development	May 22, 2025	Bug fixingChatbot	CodeCode Available	2
Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification	May 19, 2025	Code CompletionQuestion Answering	—Unverified	0
Structure-Aware Corpus Construction and User-Perception-Aligned Metrics for Large-Language-Model Code Completion	May 19, 2025	Code CompletionLanguage Modeling	—Unverified	0
Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective	May 15, 2025	Code CompletionCode Generation	CodeCode Available	0
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts	May 8, 2025	Code CompletionCode Generation	—Unverified	0
Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents	May 6, 2025	AllCode Completion	—Unverified	0

Show:10 25 50

← PrevPage 1 of 22Next →

All datasets SAFIM CodeXGLUE - Github Java Corpus CodeXGLUE - PY150 DotPrompts Defects4J Rambo Benchmark

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	deepseek-coder-33b-base	Average	69.01	—	Unverified
2	deepseek-coder-6.7b-base	Average	63.4	—	Unverified
3	starcoderbase	Average	55.54	—	Unverified
4	gpt-4-1106-preview	Average	53.28	—	Unverified
5	CodeLlama-13b-hf	Average	52.78	—	Unverified
6	deepseek-coder-1.3b-base	Average	52.63	—	Unverified
7	CodeLlama-34b-hf	Average	49.66	—	Unverified
8	CodeLlama-7b-hf	Average	45	—	Unverified
9	gpt-3.5-turbo-0301	Average	40.86	—	Unverified
10	incoder-6B	Average	33.79	—	Unverified