SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 125 of 71 papers

TitleStatusHype
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistCode3
Towards a Unified Multi-Dimensional Evaluator for Text GenerationCode2
G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentCode1
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language GenerationCode1
LUNA: A Framework for Language Understanding and Naturalness AssessmentCode1
Active Evaluation: Efficient NLG Evaluation with Few Pairwise ComparisonsCode1
Is ChatGPT a Good NLG Evaluator? A Preliminary StudyCode1
Leveraging Large Language Models for NLG Evaluation: Advances and ChallengesCode1
Themis: A Reference-free NLG Evaluation Language Model with Flexibility and InterpretabilityCode1
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement TheoryCode1
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error SynthesisCode1
A Tutorial on Evaluation Metrics used in Natural Language Generation0
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating0
All That's `Human' Is Not Gold: Evaluating Human Evaluation of Generated Text0
A Survey of Natural Language Generation0
A Survey of Evaluation Metrics Used for NLG Systems0
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text0
Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators0
Agreement is overrated: A plea for correlation to assess human evaluation reliability0
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation0
A Snapshot of NLG Evaluation Practices 2005 - 20140
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking0
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications0
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?0
Evaluation of Text Generation: A Survey0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.