Augmenting Convolutional networks with attention-based aggregation
Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/facebookresearch/deitOfficialIn paperpytorch★ 4,327
- github.com/DarshanDeshpande/jax-modelsjax★ 161
- github.com/dongkyuk/PatchConvNet-pytorchpytorch★ 29
- github.com/keras-team/keras-io/blob/master/examples/vision/patch_convnet.pytf★ 0
- github.com/mindspore-courses/External-Attention-MindSpore/blob/main/model/backbone/PatchConvnet.pymindspore★ 0
Abstract
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We replace the final average pooling by an attention-based aggregation layer akin to a single transformer block, that weights how the patches are involved in the classification decision. We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth). In contrast with a pyramidal design, this architecture family maintains the input patch resolution across all the layers. It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption, as shown by our experiments on various computer vision tasks: object classification, image segmentation and detection.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| ImageNet | PatchConvNet-L120-21k-384 | Top 1 Accuracy | 87.1 | — | Unverified |
| ImageNet | PatchConvNet-S60 | Top 1 Accuracy | 82.1 | — | Unverified |
| ImageNet | PatchConvNet-S120 | Top 1 Accuracy | 83.2 | — | Unverified |
| ImageNet | PatchConvNet-B60 | Top 1 Accuracy | 83.5 | — | Unverified |
| ImageNet | PatchConvNet-B120 | Top 1 Accuracy | 84.1 | — | Unverified |
| ImageNet | PatchConvNet-S60-21k-512 | Top 1 Accuracy | 85.4 | — | Unverified |
| ImageNet | PatchConvNet-B60-21k-384 | Top 1 Accuracy | 86.5 | — | Unverified |