nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 71 papers

Title	Date	Tasks	Status	Hype
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist	May 15, 2023	Controllable Language ModellingDialogue Generation	CodeCode Available	3
Towards a Unified Multi-Dimensional Evaluator for Text Generation	Oct 13, 2022	nlg evaluationQuestion Answering	CodeCode Available	2
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis	Oct 10, 2022	AllImage Captioning	CodeCode Available	1
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation	Sep 14, 2021	nlg evaluationStyle Transfer	CodeCode Available	1
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges	Jan 13, 2024	nlg evaluationSpecificity	CodeCode Available	1
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory	May 24, 2023	nlg evaluationText Generation	CodeCode Available	1
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment	Mar 29, 2023	Dialogue GenerationDiversity	CodeCode Available	1
LUNA: A Framework for Language Understanding and Naturalness Assessment	Jan 9, 2024	nlg evaluationText Generation	CodeCode Available	1
Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability	Jun 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons	Mar 11, 2022	nlg evaluation	CodeCode Available	1
Is ChatGPT a Good NLG Evaluator? A Preliminary Study	Mar 7, 2023	nlg evaluationStory Generation	CodeCode Available	1
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation	Dec 16, 2023	nlg evaluationResponse Generation	—Unverified	0
Treat the system like a human student: Automatic naturalness evaluation of generated text without reference texts	Nov 1, 2018	Image CaptioningMachine Translation	—Unverified	0
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models	Mar 28, 2024	nlg evaluation	—Unverified	0
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating	May 24, 2022	nlg evaluationSemantic Similarity	—Unverified	0
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects	Nov 15, 2023	Dialogue GenerationLanguage Modelling	—Unverified	0
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons	Jun 16, 2021	nlg evaluation	—Unverified	0
A Snapshot of NLG Evaluation Practices 2005 - 2014	Sep 1, 2015	nlg evaluationText Generation	—Unverified	0
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications	May 13, 2022	nlg evaluationText Generation	—Unverified	0
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?	Apr 10, 2025	Machine Translationnlg evaluation	—Unverified	0
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating	Jul 1, 2022	nlg evaluationSemantic Similarity	—Unverified	0
A Survey of Evaluation Metrics Used for NLG Systems	Aug 27, 2020	Image Captioningnlg evaluation	—Unverified	0
DHP Benchmark: Are LLMs Good NLG Evaluators?	Aug 25, 2024	Benchmarkingnlg evaluation	—Unverified	0
Dialect-robust Evaluation of Generated Text	Nov 2, 2022	nlg evaluation	—Unverified	0
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG	May 24, 2023	Dialogue GenerationDiversity	—Unverified	0
NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation	Oct 1, 2022	Managementnlg evaluation	—Unverified	0
Evaluation of Text Generation: A Survey	Jun 26, 2020	nlg evaluationSurvey	—Unverified	0
Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation	Dec 1, 2020	nlg evaluationPosition	—Unverified	0
Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators	Mar 6, 2025	nlg evaluation	—Unverified	0
The Authenticity Gap in Human Evaluation	May 24, 2022	nlg evaluationSingle Particle Analysis	—Unverified	0
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation	Jun 10, 2021	nlg evaluationText Generation	—Unverified	0
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation	Dec 17, 2021	nlg evaluationText Generation	—Unverified	0
Language Model Augmented Relevance Score	Aug 19, 2021	Language ModelingLanguage Modelling	—Unverified	0
Large Language Models Are Active Critics in NLG Evaluation	Oct 14, 2024	nlg evaluationPrompt Engineering	—Unverified	0
LLM-based NLG Evaluation: Current Status and Challenges	Feb 2, 2024	nlg evaluationText Generation	—Unverified	0
A Survey of Natural Language Generation	Dec 22, 2021	Data-to-Text GenerationDeep Learning	—Unverified	0
MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation	Jul 24, 2021	Diversitynlg evaluation	—Unverified	0
A Tutorial on Evaluation Metrics used in Natural Language Generation	Jun 1, 2021	nlg evaluationText Generation	—Unverified	0
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking	Nov 8, 2024	Fact Checkingnlg evaluation	—Unverified	0
Agreement is overrated: A plea for correlation to assess human evaluation reliability	Oct 1, 2019	nlg evaluation	—Unverified	0
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts	Apr 29, 2025	AllDiversity	—Unverified	0
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text	Jun 30, 2021	AllArticles	—Unverified	0
All That's `Human' Is Not Gold: Evaluating Human Evaluation of Generated Text	Aug 1, 2021	AllArticles	—Unverified	0
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text	Feb 14, 2022	nlg evaluationText Generation	—Unverified	0
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap	Jun 1, 2023	Explainable Artificial Intelligence (XAI)nlg evaluation	—Unverified	0
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Nov 25, 2024	Language ModelingLanguage Modelling	—Unverified	0
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation	Mar 29, 2017	nlg evaluationSurvey	—Unverified	0
The Pitfalls of Defining Hallucination	Jan 15, 2024	Hallucinationnlg evaluation	—Unverified	0
The use of rating and Likert scales in Natural Language Generation human evaluation tasks: A review and some recommendations	Oct 1, 2019	nlg evaluationText Generation	—Unverified	0
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models	Jul 15, 2023	nlg evaluationResponse Generation	CodeCode Available	0

Show:10 25 50

← PrevPage 1 of 2Next →

No leaderboard results yet.