SOTAVerified

HumanEval

Papers

Showing 125 of 264 papers

TitleStatusHype
Turning the Tide: Repository-based Code Reflection0
Rethinking Verification for LLM Code Generation: From Generation to TestingCode1
any4: Learned 4-bit Numeric Representation for LLMsCode2
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing0
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach0
Actor-Critic based Online Data Mixing For Language Model Pre-Training0
Self-Correcting Code Generation Using Small Language ModelsCode0
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks0
Evaluating Large Language Models for Code Review0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation?0
Prior Prompt Engineering for Reinforcement Fine-Tuning0
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM WatermarkingCode1
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained SettingsCode0
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
Rethinking Repetition Problems of LLMs in Code GenerationCode1
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection0
Enhancing Code Generation via Bidirectional Comment-Level Mutual GroundingCode0
Show:102550
← PrevPage 1 of 11Next →

No leaderboard results yet.