SOTAVerified|Agents Browse Leaderboard About

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–71 of 71 papers

Title	Date	Tasks	Status
Agreement is overrated: A plea for correlation to assess human evaluation reliability	Oct 1, 2019	nlg evaluation	—Unverified
The use of rating and Likert scales in Natural Language Generation human evaluation tasks: A review and some recommendations	Oct 1, 2019	nlg evaluationText Generation	—Unverified
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating	May 24, 2022	nlg evaluationSemantic Similarity	—Unverified
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models	Mar 28, 2024	nlg evaluation	—Unverified
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons	Jun 16, 2021	nlg evaluation	—Unverified
Treat the system like a human student: Automatic naturalness evaluation of generated text without reference texts	Nov 1, 2018	Image CaptioningMachine Translation	—Unverified
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text	Feb 14, 2022	nlg evaluationText Generation	—Unverified
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation	Dec 16, 2023	nlg evaluationResponse Generation	—Unverified
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts	Apr 29, 2025	AllDiversity	—Unverified
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap	Jun 1, 2023	Explainable Artificial Intelligence (XAI)nlg evaluation	—Unverified
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects	Nov 15, 2023	Dialogue GenerationLanguage Modelling	—Unverified
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications	May 13, 2022	nlg evaluationText Generation	—Unverified
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?	Apr 10, 2025	Machine Translationnlg evaluation	—Unverified
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Nov 25, 2024	Language ModelingLanguage Modelling	—Unverified
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation	Mar 29, 2017	nlg evaluationSurvey	—Unverified
DHP Benchmark: Are LLMs Good NLG Evaluators?	Aug 25, 2024	Benchmarkingnlg evaluation	—Unverified
Dialect-robust Evaluation of Generated Text	Nov 2, 2022	nlg evaluation	—Unverified
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG	May 24, 2023	Dialogue GenerationDiversity	—Unverified
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking	Nov 8, 2024	Fact Checkingnlg evaluation	—Unverified
A Tutorial on Evaluation Metrics used in Natural Language Generation	Jun 1, 2021	nlg evaluationText Generation	—Unverified
Evaluation of Text Generation: A Survey	Jun 26, 2020	nlg evaluationSurvey	—Unverified

Show:10 25 50

← PrevPage 3 of 3Next →

No leaderboard results yet.