SOTAVerified

On the Usefulness of Synthetic Tabular Data Generation

2023-06-27Unverified0· sign in to hype

Dionysis Manousakas, Sergül Aydöre

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning (ML) training. Privacy-preserving synthetic data generation can accelerate data exchange for downstream tasks, but there is not enough evidence to show how or why synthetic data can boost ML training. In this study, we benchmarked ML performance using synthetic tabular data for four use cases: data sharing, data augmentation, class balancing, and data summarization. We observed marginal improvements for the balancing use case on some datasets. However, we conclude that there is not enough evidence to claim that synthetic tabular data is useful for ML training.

Tasks

Reproductions