SOTAVerified|Agents Browse Leaderboard About

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 71 papers

Title	Date	Tasks	Status	Score
One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation	Feb 18, 2024	Allnlg evaluation	CodeCode Available	5
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs	Mar 14, 2025	nlg evaluation	CodeCode Available	5
Perturbation CheckLists for Evaluating NLG Evaluation Metrics	Sep 13, 2021	Data-to-Text Generationnlg evaluation	CodeCode Available	5
ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models	Jul 16, 2024	nlg evaluationText Generation	CodeCode Available	5
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation	Aug 6, 2023	Diversitynlg evaluation	CodeCode Available	5
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models	May 23, 2024	nlg evaluationText Generation	CodeCode Available	5
Why We Need New Evaluation Metrics for NLG	Jul 21, 2017	nlg evaluation	CodeCode Available	5
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models	Jul 15, 2023	nlg evaluationResponse Generation	CodeCode Available	5
Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation	Dec 1, 2020	nlg evaluationPosition	—Unverified	0
Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators	Mar 6, 2025	nlg evaluation	—Unverified	0
A Survey of Natural Language Generation	Dec 22, 2021	Data-to-Text GenerationDeep Learning	—Unverified	0
The Authenticity Gap in Human Evaluation	May 24, 2022	nlg evaluationSingle Particle Analysis	—Unverified	0
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation	Jun 10, 2021	nlg evaluationText Generation	—Unverified	0
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation	Dec 17, 2021	nlg evaluationText Generation	—Unverified	0
A Survey of Evaluation Metrics Used for NLG Systems	Aug 27, 2020	Image Captioningnlg evaluation	—Unverified	0
Language Model Augmented Relevance Score	Aug 19, 2021	Language ModelingLanguage Modelling	—Unverified	0
Large Language Models Are Active Critics in NLG Evaluation	Oct 14, 2024	nlg evaluationPrompt Engineering	—Unverified	0
A Snapshot of NLG Evaluation Practices 2005 - 2014	Sep 1, 2015	nlg evaluationText Generation	—Unverified	0
LLM-based NLG Evaluation: Current Status and Challenges	Feb 2, 2024	nlg evaluationText Generation	—Unverified	0
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating	Jul 1, 2022	nlg evaluationSemantic Similarity	—Unverified	0
All That's `Human' Is Not Gold: Evaluating Human Evaluation of Generated Text	Aug 1, 2021	AllArticles	—Unverified	0
MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation	Jul 24, 2021	Diversitynlg evaluation	—Unverified	0
The Pitfalls of Defining Hallucination	Jan 15, 2024	Hallucinationnlg evaluation	—Unverified	0
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text	Jun 30, 2021	AllArticles	—Unverified	0
NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation	Oct 1, 2022	Managementnlg evaluation	—Unverified	0

Show:10 25 50

← PrevPage 2 of 3Next →

No leaderboard results yet.