Models in the Wild: On Corruption Robustness of NLP Systems
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Natural Language Processing models lack a unified approach to robustness testing. In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the- art text representations and non-contextualized word embeddings. In order to improve robust- ness, we perform adversarial training on se- lected aspects and check its transferability to the improvement of models with various cor- ruption types. We find that the high perfor- mance of models does not ensure sufficient robustness, although modern embedding tech- niques help to improve it. We release cor- rupted datasets and code for WildNLP frame- work for the community.