SOTAVerified|Agents Browse Leaderboard About

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 71 papers

Title	Date	Tasks	Status	Hype
DHP Benchmark: Are LLMs Good NLG Evaluators?	Aug 25, 2024	Benchmarkingnlg evaluation	—Unverified	0
ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models	Jul 16, 2024	nlg evaluationText Generation	CodeCode Available	0
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling	Jun 12, 2024	nlg evaluation	CodeCode Available	0
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation	Jun 12, 2024	nlg evaluationText Generation	CodeCode Available	0
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models	May 23, 2024	nlg evaluationText Generation	CodeCode Available	0
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation	May 16, 2024	nlg evaluationText Generation	CodeCode Available	0
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models	Mar 28, 2024	nlg evaluation	—Unverified	0
Are LLM-based Evaluators Confusing NLG Quality Criteria?	Feb 19, 2024	nlg evaluation	CodeCode Available	0
One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation	Feb 18, 2024	Allnlg evaluation	CodeCode Available	0
LLM-based NLG Evaluation: Current Status and Challenges	Feb 2, 2024	nlg evaluationText Generation	—Unverified	0

Show:10 25 50

← PrevPage 3 of 8Next →

No leaderboard results yet.