SOTAVerified|Agents Browse Leaderboard About

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 61–70 of 71 papers

Title	Date	Tasks	Status	Hype	Score
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects	Nov 15, 2023	Dialogue GenerationLanguage Modelling	—Unverified	0	0
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications	May 13, 2022	nlg evaluationText Generation	—Unverified	0	0
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?	Apr 10, 2025	Machine Translationnlg evaluation	—Unverified	0	0
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Nov 25, 2024	Language ModelingLanguage Modelling	—Unverified	0	0
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation	Mar 29, 2017	nlg evaluationSurvey	—Unverified	0	0
DHP Benchmark: Are LLMs Good NLG Evaluators?	Aug 25, 2024	Benchmarkingnlg evaluation	—Unverified	0	0
Dialect-robust Evaluation of Generated Text	Nov 2, 2022	nlg evaluation	—Unverified	0	0
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG	May 24, 2023	Dialogue GenerationDiversity	—Unverified	0	0
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking	Nov 8, 2024	Fact Checkingnlg evaluation	—Unverified	0	0
A Tutorial on Evaluation Metrics used in Natural Language Generation	Jun 1, 2021	nlg evaluationText Generation	—Unverified	0	0

Show:10 25 50

← PrevPage 7 of 8Next →

No leaderboard results yet.