ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring
David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/google-research/remixmatchOfficialIn papertf★ 131
- github.com/google-research/mixmatchOfficialIn papertf★ 0
- github.com/zysymu/AdaMatch-pytorchpytorch★ 26
Abstract
We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between 5 and 16 less data to reach the same accuracy. For example, on CIFAR-10 with 250 labeled examples we reach 93.73\% accuracy (compared to MixMatch's accuracy of 93.58\% with 4,000 examples) and a median accuracy of 84.92\% with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| STL-10 | ReMixMatch (K=1) | Percentage correct | 93.23 | — | Unverified |
| STL-10 | CC-GAN | Percentage correct | 77.8 | — | Unverified |
| STL-10 | ReMixMatch (K=4) | Percentage correct | 93.82 | — | Unverified |