SOTAVerified

HumanEval

Papers

Showing 101150 of 264 papers

TitleStatusHype
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language ModelsCode1
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modulesCode1
Software Vulnerability and Functionality Assessment using LLMs0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis0
Actor-Critic based Online Data Mixing For Language Model Pre-Training0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment0
Addressing Data Leakage in HumanEval Using Combinatorial Test Design0
AIME: AI System Optimization via Multiple LLM Evaluators0
Aligning CodeLLMs with Direct Preference Optimization0
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement0
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks0
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks0
ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement0
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining0
A Review of Repository Level Prompting for LLMs0
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
AutoTest: Evolutionary Code Solution Selection with Test Cases0
BASS: Batched Attention-optimized Speculative Sampling0
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs0
Brevity is the soul of wit: Pruning long files for code generation0
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation0
Can LLMs Enable Verification in Mainstream Programming?0
CELI: Controller-Embedded Language Model Interactions0
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation0
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model0
CodeMirage: Hallucinations in Code Generated by Large Language Models0
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency0
CodeShell Technical Report0
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models0
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting0
Context-Augmented Code Generation Using Programming Knowledge Graphs0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding0
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution0
Dafny as Verification-Aware Intermediate Language for Code Generation0
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data0
Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models0
Discrete Flow Matching0
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation0
Does Few-Shot Learning Help LLM Performance in Code Synthesis?0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation0
Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference0
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs0
Dynamic Scaling of Unit Tests for Code Reward Modeling0
Structured Chain-of-Thought Prompting for Code Generation0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach0
Show:102550
← PrevPage 3 of 6Next →

No leaderboard results yet.