SOTAVerified

nlg evaluation

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Papers

Showing 1120 of 71 papers

TitleStatusHype
ReFeR: Improving Evaluation and Reasoning through Hierarchy of ModelsCode0
Themis: A Reference-free NLG Evaluation Language Model with Flexibility and InterpretabilityCode1
Better than Random: Reliable NLG Human Evaluation with Constrained Active SamplingCode0
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG EvaluationCode0
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language ModelsCode0
DEBATE: Devil's Advocate-Based Assessment and Text EvaluationCode0
WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models0
Are LLM-based Evaluators Confusing NLG Quality Criteria?Code0
One Prompt To Rule Them All: LLMs for Opinion Summary EvaluationCode0
LLM-based NLG Evaluation: Current Status and Challenges0
Show:102550
← PrevPage 2 of 8Next →

No leaderboard results yet.