CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
Filip Radenović, Giorgos Tolias, Ondřej Chum
Code Available — Be the first to reproduce this paper.
ReproduceCode
Abstract
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Oxf105k | siaMAC+QE* | MAP | 77.9 | — | Unverified |
| Oxf5k | siaMAC+QE* | MAP | 82.9 | — | Unverified |
| Par106k | siaMAC+QE* | mAP | 78.3 | — | Unverified |
| Par6k | siaMAC+QE* | mAP | 85.6 | — | Unverified |