SOTAVerified|Agents Browse Leaderboard About

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–71 of 71 papers

Title	Date	Tasks	Status
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation	Oct 22, 2024	nlg evaluation	CodeCode Available
Are LLM-based Evaluators Confusing NLG Quality Criteria?	Feb 19, 2024	nlg evaluation	CodeCode Available
A Study of Automatic Metrics for the Evaluation of Natural Language Explanations	Mar 15, 2021	nlg evaluationText Generation	CodeCode Available
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling	Jun 12, 2024	nlg evaluation	CodeCode Available
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding	Apr 16, 2022	Cross-Lingual Natural Language InferenceNatural Language Inference	CodeCode Available
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics	Sep 20, 2022	CPUGPU	CodeCode Available
CLSE: Corpus of Linguistically Significant Entities	Nov 4, 2022	nlg evaluationText Generation	CodeCode Available
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation	May 16, 2024	nlg evaluationText Generation	CodeCode Available
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering	Jul 13, 2023	Dialogue Generationnlg evaluation	CodeCode Available
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation	Jun 12, 2024	nlg evaluationText Generation	CodeCode Available
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions	Mar 7, 2023	nlg evaluationRepresentation Learning	CodeCode Available
Long-Form Information Alignment Evaluation Beyond Atomic Facts	May 21, 2025	Formnlg evaluation	CodeCode Available
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets	May 13, 2022	nlg evaluationQuestion Answering	CodeCode Available
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References	May 24, 2023	AllMachine Translation	CodeCode Available
One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation	Feb 18, 2024	Allnlg evaluation	CodeCode Available
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs	Mar 14, 2025	nlg evaluation	CodeCode Available
Perturbation CheckLists for Evaluating NLG Evaluation Metrics	Sep 13, 2021	Data-to-Text Generationnlg evaluation	CodeCode Available
ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models	Jul 16, 2024	nlg evaluationText Generation	CodeCode Available
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation	Aug 6, 2023	Diversitynlg evaluation	CodeCode Available
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models	May 23, 2024	nlg evaluationText Generation	CodeCode Available
Why We Need New Evaluation Metrics for NLG	Jul 21, 2017	nlg evaluation	CodeCode Available

Show:10 25 50

← PrevPage 2 of 2Next →

No leaderboard results yet.