SOTAVerified

Exact Paired Permutation Testing Algorithms for NLP Systems

2022-01-16ACL ARR January 2022Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Significance testing has played a vital role in the development of NLP systems, providing confidence that one system is indeed better than another one. However, many significance tests involve hard computation problems, and so we rely on approximation methods such as Monte Carlo sampling. In this paper, we provide an exact dynamic programming algorithm that runs in quadratic time in the size of the dataset and performs the paired permutation test, a widely used test in comparing two systems, for the case of comparing accuracies between two classification systems. We show that Monte Carlo approximations are often too noisy to reliably determine whether we can reject the null hypothesis. We show that Monte Carlo approximations are often too noisy to reliably determine whether we can reject the null hypothesis with a significance level of 0.05 for any number of sentence N. Additionally, we show that our exact algorithm is more efficient than the approximation algorithm for N 10K.

Tasks

Reproductions