SOTAVerified

HumanEval

Papers

Showing 201250 of 264 papers

TitleStatusHype
Discrete Flow Matching0
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants0
Brevity is the soul of wit: Pruning long files for code generation0
Towards Large Language Model Aided Program Refinement0
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency0
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank AdaptationCode0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
Validating LLM-Generated Programs with Metamorphic Prompt Testing0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation0
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code0
Kotlin ML Pack: Technical Report0
Can Github issues be solved with Tree Of Thoughts?Code0
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation0
BASS: Batched Attention-optimized Speculative Sampling0
NExT: Teaching Large Language Models to Reason about Code Execution0
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation0
Comments as Natural Logic Pivots: Improve Code Generation via Comment PerspectiveCode0
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation0
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?0
CodeShell Technical Report0
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents0
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case StudyCode0
Software Vulnerability and Functionality Assessment using LLMs0
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge0
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code0
Test-Driven Development for Code Generation0
HumanEval on Latest GPT Models -- 2024Code0
Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models0
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness0
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language ModelsCode0
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs0
Instruction Fusion: Advancing Prompt Evolution through HybridizationCode0
A Review of Repository Level Prompting for LLMs0
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data0
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion0
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code GenerationCode0
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation0
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model0
The Program Testing Ability of Large Language Models for Code0
Enhancing Large Language Models in Coding Through Multi-Perspective Self-ConsistencyCode0
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language ModelsCode0
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression0
Can Programming Languages Boost Each Other via Instruction Tuning?Code0
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation0
Show:102550
← PrevPage 5 of 6Next →

No leaderboard results yet.