SOTAVerified

Unsupervised Functional Dependency Discovery for Data Preparation

2019-03-20ICLR Workshop LLDUnverified0· sign in to hype

Zhihan Guo, Theodoros Rekatsinas

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study the problem of functional dependency (FD) discovery to impose domain knowledge for downstream data preparation tasks. We introduce a framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that our methods can scale to large data instances with millions of tuples and hundreds of attributes, while recovering true FDs across a diverse array of synthetic datasets, even in the presence of noisy data. Overall, our methods show an average F1 improvement of 2× against state-of-the-art FD discovery methods. Our system also obtains better F1 in downstream data repairing task than manually defined FDs.

Tasks

Reproductions