SOTAVerified

Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond

2021-06-01Proceedings of the International Conference on Management of Data (SIGMOD) 2021Code Available1· sign in to hype

Zhengjie Miao, Yuliang Li and Xiaolan Wang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Deep Learning revolutionizes almost all fields of computer science including data management. However, the demand for high-quality training data is slowing down deep neural nets' wider adoption. To this end, data augmentation (DA), which generates more labeled examples from existing ones, becomes a common technique. Meanwhile, the risk of creating noisy examples and the large space of hyper-parameters make DA less attractive in practice. We introduce Rotom, a multi-purpose data augmentation framework for a range of data management and mining tasks including entity matching, data cleaning, and text classification. Rotom features InvDA, a new DA operator that generates natural yet diverse augmented examples by formulating DA as a seq2seq task. The key technical novelty of Rotom is a meta-learning framework that automatically learns a policy for combining examples from different DA operators, whereby combinatorially reduces the hyper-parameters space. Our experimental results show that Rotom effectively improves a model's performance by combining multiple DA operators, even when applying them individually does not yield performance improvement. With this strength, Rotom outperforms the state-of-the-art entity matching and data cleaning systems in the low-resource settings as well as two recently proposed DA techniques for text classification.

Tasks

Reproductions