SOTAVerified|Agents Browse Leaderboard About

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 71 papers

Title	Date	Tasks	Status	Hype
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation	Dec 16, 2023	nlg evaluationResponse Generation	—Unverified	0
Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation	Dec 1, 2020	nlg evaluationPosition	—Unverified	0
A Snapshot of NLG Evaluation Practices 2005 - 2014	Sep 1, 2015	nlg evaluationText Generation	—Unverified	0
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?	Apr 10, 2025	Machine Translationnlg evaluation	—Unverified	0
A Survey of Natural Language Generation	Dec 22, 2021	Data-to-Text GenerationDeep Learning	—Unverified	0
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text	Jun 30, 2021	AllArticles	—Unverified	0
DHP Benchmark: Are LLMs Good NLG Evaluators?	Aug 25, 2024	Benchmarkingnlg evaluation	—Unverified	0
Dialect-robust Evaluation of Generated Text	Nov 2, 2022	nlg evaluation	—Unverified	0
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG	May 24, 2023	Dialogue GenerationDiversity	—Unverified	0
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking	Nov 8, 2024	Fact Checkingnlg evaluation	—Unverified	0

Show:10 25 50

← PrevPage 3 of 8Next →

No leaderboard results yet.