SOTAVerified

Coping with Noisy Training Data Labels in Paraphrase Detection

2021-11-01WNUT (ACL) 2021Unverified0· sign in to hype

Teemu Vahtola, Mathias Creutz, Eetu Sjöblom, Sami Itkonen

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We present new state-of-the-art benchmarks for paraphrase detection on all six languages in the Opusparcus sentential paraphrase corpus: English, Finnish, French, German, Russian, and Swedish. We reach these baselines by fine-tuning BERT. The best results are achieved on smaller and cleaner subsets of the training sets than was observed in previous research. Additionally, we study a translation-based approach that is competitive for the languages with more limited and noisier training data.

Tasks

Reproductions