SOTAVerified

“This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation

2020-12-01ACL (EvalNLGEval, INLG) 2020Unverified0· sign in to hype

Stephanie Schoch, Diyi Yang, Yangfeng Ji

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Despite recent efforts reviewing current human evaluation practices for natural language generation (NLG) research, the lack of reported question wording and potential for framing effects or cognitive biases influencing results has been widely overlooked. In this opinion paper, we detail three possible framing effects and cognitive biases that could be imposed on human evaluation in NLG. Based on this, we make a call for increased transparency for human evaluation in NLG and propose the concept of human evaluation statements. We make several recommendations for design details to report that could potentially influence results, such as question wording, and suggest that reporting pertinent design details can help increase comparability across studies as well as reproducibility of results.

Tasks

Reproductions