Container: Context Aggregation Network

2021-06-02Code Available1· sign in to hype

Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Code Available — Be the first to reproduce this paper.

Code

github.com/allenai/container
OfficialIn paperpytorch★ 57
github.com/gaopengcuhk/Container
Officialnone★ 46
github.com/alrhub/arturo
pytorch★ 4
github.com/philippdahlinger/ltsgns_ai4science
pytorch★ 1

Abstract

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers -- originally introduced in natural language processing -- have been increasingly adopted in computer vision. While early adopters continue to employ CNN backbones, the latest networks are end-to-end CNN-free Transformer solutions. A recent surprising finding shows that a simple MLP based solution without any traditional convolutional or Transformer components can produce effective visual representations. While CNNs, Transformers and MLP-Mixers may be considered as completely disparate architectures, we provide a unified view showing that they are in fact special cases of a more general method to aggregate spatial context in a neural network stack. We present the (CONText AggregatIon NEtwoRk), a general-purpose building block for multi-head context aggregation that can exploit long-range interactions a la Transformers while still exploiting the inductive bias of the local convolution operation leading to faster convergence speeds, often seen in CNNs. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named , can be employed in object detection and instance segmentation networks such as DETR, RetinaNet and Mask-RCNN to obtain an impressive detection mAP of 38.9, 43.8, 45.1 and mask mAP of 41.3, providing large improvements of 6.6, 7.3, 6.9 and 6.6 pts respectively, compared to a ResNet-50 backbone with a comparable compute and parameter size. Our method also achieves promising results on self-supervised learning compared to DeiT on the DINO framework. Code is released at https://github.com/allenai/container.

Tasks

Image Classification Inductive Bias Instance Segmentation object-detection Object Detection Self-Supervised Learning Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	Container Container	Top 1 Accuracy	82.7	—	Unverified
ImageNet	Container-Light	Top 1 Accuracy	82	—	Unverified

Container: Context Aggregation Network

Code

Abstract

Tasks

Benchmark Results

Reproductions