SOTAVerified

Making Convolutional Networks Shift-Invariant Again

2019-04-25Code Available1· sign in to hype

Richard Zhang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe increased accuracy in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks. Code and anti-aliased versions of popular networks are available at https://richzhang.github.io/antialiased-cnns/ .

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
VizWiz-ClassificationResNet-101 (lpf3)Accuracy - All Images41.7Unverified
VizWiz-ClassificationResNet-50 (lpf5)Accuracy - All Images41.5Unverified
VizWiz-ClassificationResNet-101 (lpf2)Accuracy - All Images41.1Unverified
VizWiz-ClassificationResNet-101 (lpf5)Accuracy - All Images41Unverified
VizWiz-ClassificationResNet-50 (lpf2)Accuracy - All Images40.3Unverified
VizWiz-ClassificationResNet-50 (lpf3)Accuracy - All Images40Unverified
VizWiz-ClassificationDenseNet121 (lpf5)Accuracy - All Images38.7Unverified
VizWiz-ClassificationResNet-34 (lpf2)Accuracy - All Images38.3Unverified
VizWiz-ClassificationDenseNet-121 (lpf3)Accuracy - All Images38.3Unverified
VizWiz-ClassificationResNet-34 (lpf3)Accuracy - All Images38.3Unverified
VizWiz-ClassificationDenseNet-121 (lpf2)Accuracy - All Images38.3Unverified
VizWiz-ClassificationVGG-16 BN (lpf2)Accuracy - All Images37.2Unverified
VizWiz-ClassificationResNet-34 (lpf5)Accuracy - All Images37.2Unverified
VizWiz-ClassificationVGG-16 BN (lpf5)Accuracy - All Images37Unverified
VizWiz-ClassificationVGG-16 BN (lpf3)Accuracy - All Images36.9Unverified
VizWiz-ClassificationMobileNetV2 (lpf3)Accuracy - All Images36Unverified
VizWiz-ClassificationMobileNetV2 (lpf5)Accuracy - All Images35.8Unverified
VizWiz-ClassificationResNet-18 (lpf3)Accuracy - All Images35.6Unverified
VizWiz-ClassificationMobileNetV2 (lpf2)Accuracy - All Images35.5Unverified
VizWiz-ClassificationResNet-18 (lpf2)Accuracy - All Images35.5Unverified
VizWiz-ClassificationVGG-16 (lpf3)Accuracy - All Images35.1Unverified
VizWiz-ClassificationResNet-18 (lpf5)Accuracy - All Images34.7Unverified
VizWiz-ClassificationVGG-16 (lpf5)Accuracy - All Images34.5Unverified
VizWiz-ClassificationVGG-16 (lpf2)Accuracy - All Images33.5Unverified
VizWiz-ClassificationAlexNet (lpf3)Accuracy - All Images23.1Unverified
VizWiz-ClassificationAlexNet (lpf2)Accuracy - All Images22.8Unverified
VizWiz-ClassificationAlexNet (lpf5)Accuracy - All Images22.7Unverified

Reproductions