Co-Scale Conv-Attentional Image Transformers

2021-04-13ICCV 2021Code Available1· sign in to hype

Weijian Xu, Yifan Xu, Tyler Chang, Zhuowen Tu

Code Available — Be the first to reproduce this paper.

Code

github.com/mlpc-ucsd/CoaT
OfficialIn paperpytorch★ 235
github.com/rishikksh20/CoaT-pytorch
pytorch★ 15
github.com/BR-IDL/PaddleViT/tree/develop/image_classification/CoaT
paddle★ 0
github.com/mindspore-courses/External-Attention-MindSpore/blob/main/model/backbone/CoaT.py
mindspore★ 0
github.com/Mind23-2/MindCode-170
mindspore★ 0
github.com/leondgarse/keras_cv_attention_models/tree/main/keras_cv_attention_models/coat
tf★ 0

Abstract

In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms. First, the co-scale mechanism maintains the integrity of Transformers' encoder branches at individual scales, while allowing representations learned at different scales to effectively communicate with each other; we design a series of serial and parallel blocks to realize the co-scale mechanism. Second, we devise a conv-attentional mechanism by realizing a relative position embedding formulation in the factorized attention module with an efficient convolution-like implementation. CoaT empowers image Transformers with enriched multi-scale and contextual modeling capabilities. On ImageNet, relatively small CoaT models attain superior classification results compared with similar-sized convolutional neural networks and image/vision Transformers. The effectiveness of CoaT's backbone is also illustrated on object detection and instance segmentation, demonstrating its applicability to downstream computer vision tasks.

Tasks

Instance Segmentation object-detection Object Detection Semantic Segmentation

Co-Scale Conv-Attentional Image Transformers

Code

Abstract

Tasks

Reproductions