SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 3140 of 71 papers

TitleStatusHype
The Pitfalls of Defining Hallucination0
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation0
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects0
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG EvaluationCode0
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language ModelsCode0
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question AnsweringCode0
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap0
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG0
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesCode0
Describe me an Aucklet: Generating Grounded Perceptual Category DescriptionsCode0
Show:102550
← PrevPage 4 of 8Next →

No leaderboard results yet.