SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 125 of 71 papers

TitleStatusHype
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistCode3
Towards a Unified Multi-Dimensional Evaluator for Text GenerationCode2
Is ChatGPT a Good NLG Evaluator? A Preliminary StudyCode1
G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentCode1
Leveraging Large Language Models for NLG Evaluation: Advances and ChallengesCode1
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error SynthesisCode1
LUNA: A Framework for Language Understanding and Naturalness AssessmentCode1
Active Evaluation: Efficient NLG Evaluation with Few Pairwise ComparisonsCode1
Themis: A Reference-free NLG Evaluation Language Model with Flexibility and InterpretabilityCode1
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement TheoryCode1
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language GenerationCode1
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying ReferencesCode0
Near-Negative Distinction: Giving a Second Life to Human Evaluation DatasetsCode0
Better than Random: Reliable NLG Human Evaluation with Constrained Active SamplingCode0
Long-Form Information Alignment Evaluation Beyond Atomic FactsCode0
One Prompt To Rule Them All: LLMs for Opinion Summary EvaluationCode0
A Study of Automatic Metrics for the Evaluation of Natural Language ExplanationsCode0
Describe me an Aucklet: Generating Grounded Perceptual Category DescriptionsCode0
CLSE: Corpus of Linguistically Significant EntitiesCode0
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation MetricsCode0
DEBATE: Devil's Advocate-Based Assessment and Text EvaluationCode0
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question AnsweringCode0
Are LLM-based Evaluators Confusing NLG Quality Criteria?Code0
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and UnderstandingCode0
Analyzing and Evaluating Correlation Measures in NLG Meta-EvaluationCode0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.