SOTAVerified|Agents Browse Leaderboard About

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 71 papers

Title	Date	Tasks	Status
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation	May 16, 2024	nlg evaluationText Generation	CodeCode Available
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models	Mar 28, 2024	nlg evaluation	—Unverified
Are LLM-based Evaluators Confusing NLG Quality Criteria?	Feb 19, 2024	nlg evaluation	CodeCode Available
One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation	Feb 18, 2024	Allnlg evaluation	CodeCode Available
LLM-based NLG Evaluation: Current Status and Challenges	Feb 2, 2024	nlg evaluationText Generation	—Unverified
The Pitfalls of Defining Hallucination	Jan 15, 2024	Hallucinationnlg evaluation	—Unverified
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation	Dec 16, 2023	nlg evaluationResponse Generation	—Unverified
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects	Nov 15, 2023	Dialogue GenerationLanguage Modelling	—Unverified
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation	Aug 6, 2023	Diversitynlg evaluation	CodeCode Available
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models	Jul 15, 2023	nlg evaluationResponse Generation	CodeCode Available
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering	Jul 13, 2023	Dialogue Generationnlg evaluation	CodeCode Available
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap	Jun 1, 2023	Explainable Artificial Intelligence (XAI)nlg evaluation	—Unverified
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG	May 24, 2023	Dialogue GenerationDiversity	—Unverified
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References	May 24, 2023	AllMachine Translation	CodeCode Available
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions	Mar 7, 2023	nlg evaluationRepresentation Learning	CodeCode Available
CLSE: Corpus of Linguistically Significant Entities	Nov 4, 2022	nlg evaluationText Generation	CodeCode Available
Dialect-robust Evaluation of Generated Text	Nov 2, 2022	nlg evaluation	—Unverified
NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation	Oct 1, 2022	Managementnlg evaluation	—Unverified
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics	Sep 20, 2022	CPUGPU	CodeCode Available
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating	Jul 1, 2022	nlg evaluationSemantic Similarity	—Unverified
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating	May 24, 2022	nlg evaluationSemantic Similarity	—Unverified
The Authenticity Gap in Human Evaluation	May 24, 2022	nlg evaluationSingle Particle Analysis	—Unverified
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets	May 13, 2022	nlg evaluationQuestion Answering	CodeCode Available
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications	May 13, 2022	nlg evaluationText Generation	—Unverified
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding	Apr 16, 2022	Cross-Lingual Natural Language InferenceNatural Language Inference	CodeCode Available

Show:10 25 50

← PrevPage 2 of 3Next →

No leaderboard results yet.