SOTAVerified

Augmenting Convolutional networks with attention-based aggregation

2021-12-27Code Available1· sign in to hype

Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We replace the final average pooling by an attention-based aggregation layer akin to a single transformer block, that weights how the patches are involved in the classification decision. We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth). In contrast with a pyramidal design, this architecture family maintains the input patch resolution across all the layers. It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption, as shown by our experiments on various computer vision tasks: object classification, image segmentation and detection.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
ImageNetPatchConvNet-L120-21k-384Top 1 Accuracy87.1Unverified
ImageNetPatchConvNet-S60Top 1 Accuracy82.1Unverified
ImageNetPatchConvNet-S120Top 1 Accuracy83.2Unverified
ImageNetPatchConvNet-B60Top 1 Accuracy83.5Unverified
ImageNetPatchConvNet-B120Top 1 Accuracy84.1Unverified
ImageNetPatchConvNet-S60-21k-512Top 1 Accuracy85.4Unverified
ImageNetPatchConvNet-B60-21k-384Top 1 Accuracy86.5Unverified

Reproductions