Grafit: Learning fine-grained image representations with coarse labels
Hugo Touvron, Alexandre Sablayrolles, Matthijs Douze, Matthieu Cord, Hervé Jégou
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifying images at a finer granularity than that available at train time. It also improves the accuracy for transfer learning tasks to fine-grained datasets, thereby establishing the new state of the art on five public benchmarks, like iNaturalist-2018.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Food-101 | Grafit (RegNet-8GF) | Accuracy | 93.7 | — | Unverified |
| Oxford 102 Flowers | Grafit (RegNet-8GF) | Accuracy | 99.1 | — | Unverified |
| Stanford Cars | Grafit (RegNet-8GF) | Accuracy | 94.7 | — | Unverified |