SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 5171 of 71 papers

TitleStatusHype
Analyzing and Evaluating Correlation Measures in NLG Meta-EvaluationCode0
Are LLM-based Evaluators Confusing NLG Quality Criteria?Code0
A Study of Automatic Metrics for the Evaluation of Natural Language ExplanationsCode0
Better than Random: Reliable NLG Human Evaluation with Constrained Active SamplingCode0
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and UnderstandingCode0
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation MetricsCode0
CLSE: Corpus of Linguistically Significant EntitiesCode0
DEBATE: Devil's Advocate-Based Assessment and Text EvaluationCode0
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question AnsweringCode0
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG EvaluationCode0
Describe me an Aucklet: Generating Grounded Perceptual Category DescriptionsCode0
Long-Form Information Alignment Evaluation Beyond Atomic FactsCode0
Near-Negative Distinction: Giving a Second Life to Human Evaluation DatasetsCode0
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesCode0
One Prompt To Rule Them All: LLMs for Opinion Summary EvaluationCode0
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMsCode0
Perturbation CheckLists for Evaluating NLG Evaluation MetricsCode0
ReFeR: Improving Evaluation and Reasoning through Hierarchy of ModelsCode0
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG EvaluationCode0
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language ModelsCode0
Why We Need New Evaluation Metrics for NLGCode0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.