GIST: Greedy Independent Set Thresholding for Diverse Data Summarization
Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia Desalvo
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We introduce a novel subset selection problem called min-distance diversification with monotone submodular utility (MDMS), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of MDMS is to maximize an objective function combining a monotone submodular utility term and a min-distance diversity term between any pair of selected points, subject to a cardinality constraint. We propose the GIST algorithm, which achieves a 12-approximation guarantee for MDMS by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate to within a factor of 0.5584. Finally, we demonstrate that GIST outperforms existing benchmarks for on a real-world image classification task that studies single-shot subset selection for ImageNet.