Improving Unsupervised Sentence Simplification Using Fine-Tuned Masked Language Models
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Simple word suggestion in unsupervised sentence simplification (SS) methods is mostly done independently of the context. The idea of adapting and fine-tuning a context-aware model on simple data for improving performance has been discussed but not practiced. In this paper, we propose a framework involving fine-tuning a pre-trained BERT masked language model on simple English corpora to aid SS. Our analysis of public test data shows that fine-tuning on any set of simple sentences do not necessarily yield better simplifications but generally makes improvements. To tackle this issue, we propose a self-supervised framework which is composed of a labeling method that conducts an estimate about the usefulness of each training sample, paired with a simple linear classifier that decides the inclusion of a given sentence in the fine-tuning process. The fine-tuned BERT will be used in an iterative edit-based unsupervised SS model to provide contextual word suggestions. The results show that our data selection approach can improve simplifications as much as having a simple-to-complex parallel corpus.