Know your tools well: Better and faster QA with synthetic examples
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Synthetic training data---commonly used to augment human-labeled examples in supervised learning---are often noisy, but can be generated in very large quantities and diversity. This paper proposes to leverage these unique attributes in a targeted manner to maximize the utility of synthetic examples. Via two novel applications that utilize synthetic data for targeted pre-training and knowledge distillation, we demonstrate the feasibility of this idea for machine reading comprehension (MRC). Using our proposed methods, we are able to train simultaneously smaller, faster and more accurate MRC models than existing synthetic augmentation methods. Our methods are generic in nature and can be applied to any task for which synthetic data can be generated.