SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 2650 of 71 papers

TitleStatusHype
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG EvaluationCode0
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language ModelsCode0
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question AnsweringCode0
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap0
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG0
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesCode0
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement TheoryCode1
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistCode3
G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentCode1
Is ChatGPT a Good NLG Evaluator? A Preliminary StudyCode1
Describe me an Aucklet: Generating Grounded Perceptual Category DescriptionsCode0
CLSE: Corpus of Linguistically Significant EntitiesCode0
Dialect-robust Evaluation of Generated Text0
Towards a Unified Multi-Dimensional Evaluator for Text GenerationCode2
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error SynthesisCode1
NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation0
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation MetricsCode0
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating0
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating0
The Authenticity Gap in Human Evaluation0
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications0
Near-Negative Distinction: Giving a Second Life to Human Evaluation DatasetsCode0
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and UnderstandingCode0
Active Evaluation: Efficient NLG Evaluation with Few Pairwise ComparisonsCode1
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.