Dataset Factorization for Condensation

2022-11-01NIPS 2022Code Available1· sign in to hype

Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, Xinchao Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/huage001/datasetfactorization
In paperpytorch★ 67

Abstract

In this paper, we study dataset distillation (DD), from a novel perspective and introduce a dataset factorization approach, termed HaBa, which is a plug-and-play strategy portable to any existing DD baseline. Unlike conventional DD approaches that aim to produce distilled and representative samples, HaBa explores decomposing a dataset into two components: data Hallucination networks and Bases, where the latter is fed into the former to reconstruct image samples. The flexible combinations between bases and hallucination networks, therefore, equip the distilled data with exponential informativeness gain, which largely increase the representation capability of distilled datasets. To furthermore increase the data efficiency of compression results, we further introduce a pair of adversarial contrastive constraints on the resultant hallucination networks and bases, which increase the diversity of generated images and inject more discriminant information into the factorization. Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65\%. Moreover, distilled datasets by our approach also achieve 10\% higher accuracy than baseline methods in cross-architecture generalization.

Tasks

Dataset Distillation Diversity Hallucination Informativeness

Dataset Factorization for Condensation

Code

Abstract

Tasks

Reproductions