SOTAVerified

Know your tools well: Better and faster QA with synthetic examples

2021-10-16ACL ARR October 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Synthetic training data---commonly used to augment human-labeled examples in supervised learning---are often noisy, but can be generated in very large quantities and diversity. This paper proposes to leverage these unique attributes in a targeted manner to maximize the utility of synthetic examples. Via two novel applications that utilize synthetic data for targeted pre-training and knowledge distillation, we demonstrate the feasibility of this idea for machine reading comprehension (MRC). Using our proposed methods, we are able to train simultaneously smaller, faster and more accurate MRC models than existing synthetic augmentation methods. Our methods are generic in nature and can be applied to any task for which synthetic data can be generated.

Tasks

Reproductions