A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
2020-11-08INLG (ACL) 2020Code Available0· sign in to hype
Craig Thomson, Ehud Reiter
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/nlgcat/evaluating_accuracyOfficialIn papernone★ 4
Abstract
Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics