SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 141–150 of 264 papers

Title	Date	Tasks	Status	Hype
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified	0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases	Jun 11, 2024	Code GenerationHumanEval	—Unverified	0
Validating LLM-Generated Programs with Metamorphic Prompt Testing	Jun 11, 2024	HumanEval	—Unverified	0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark	Jun 10, 2024	HumanEvalProgram Synthesis	CodeCode Available	1
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified	0
SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning	Jun 3, 2024	Code CompletionCode Generation	CodeCode Available	1
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3

Show:10 25 50

← PrevPage 15 of 27Next →

No leaderboard results yet.