SOTAVerified

On the Evaluation Metrics for Paraphrase Generation

2022-02-17Code Available1· sign in to hype

Lingfeng Shen, Lemao Liu, Haiyun Jiang, Shuming Shi

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this paper we revisit automatic metrics for paraphrase evaluation and obtain two findings that disobey conventional wisdom: (1) Reference-free metrics achieve better performance than their reference-based counterparts. (2) Most commonly used metrics do not align well with human annotation. Underlying reasons behind the above findings are explored through additional experiments and in-depth analyses. Based on the experiments and analyses, we propose ParaScore, a new evaluation metric for paraphrase generation. It possesses the merits of reference-based and reference-free metrics and explicitly models lexical divergence. Experimental results demonstrate that ParaScore significantly outperforms existing metrics.

Tasks

Reproductions