SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 5171 of 71 papers

TitleStatusHype
Agreement is overrated: A plea for correlation to assess human evaluation reliability0
The use of rating and Likert scales in Natural Language Generation human evaluation tasks: A review and some recommendations0
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating0
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models0
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons0
Treat the system like a human student: Automatic naturalness evaluation of generated text without reference texts0
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text0
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation0
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts0
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap0
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects0
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications0
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?0
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text0
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation0
DHP Benchmark: Are LLMs Good NLG Evaluators?0
Dialect-robust Evaluation of Generated Text0
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG0
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking0
A Tutorial on Evaluation Metrics used in Natural Language Generation0
Evaluation of Text Generation: A Survey0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.