SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 6170 of 71 papers

TitleStatusHype
Describe me an Aucklet: Generating Grounded Perceptual Category DescriptionsCode0
Long-Form Information Alignment Evaluation Beyond Atomic FactsCode0
Near-Negative Distinction: Giving a Second Life to Human Evaluation DatasetsCode0
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesCode0
One Prompt To Rule Them All: LLMs for Opinion Summary EvaluationCode0
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMsCode0
Perturbation CheckLists for Evaluating NLG Evaluation MetricsCode0
ReFeR: Improving Evaluation and Reasoning through Hierarchy of ModelsCode0
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG EvaluationCode0
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language ModelsCode0
Show:102550
← PrevPage 7 of 8Next →

No leaderboard results yet.