Rethinking and Refining the Distinct Metric
2021-11-16ACL ARR November 2021Unverified0· sign in to hype
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Distinct is a widely used automatic metric for evaluating the diversity of language generation tasks. However, we observe that the original approach to calculating distinct scores has evident biases that tend to add higher penalties to longer sequences. In this paper, we refine the calculation of distinct scores by re-scaling the number of distinct tokens based on its expectation. We provide both empirical and theoretical evidence to show that our method effectively removes the biases exhibited in the original distinct score. Further analyses also demonstrate that the refined score correlates better with human evaluations.