Convolutional Xformers for Vision

2022-01-25Code Available1· sign in to hype

Pranav Jeevan, Amit Sethi

Code Available — Be the first to reproduce this paper.

Code

github.com/pranavphoenix/cxv
OfficialIn paperpytorch★ 26

Abstract

Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-of-the-art accuracy on certain benchmarks. The reason for their limited use include their need for larger training datasets and more computational resources compared to convolutional neural networks (CNNs), owing to the quadratic complexity of their self-attention mechanism. We propose a linear attention-convolution hybrid architecture -- Convolutional X-formers for Vision (CXV) -- to overcome these limitations. We replace the quadratic attention with linear attention mechanisms, such as Performer, Nystr\"omformer, and Linear Transformer, to reduce its GPU usage. Inductive prior for image data is provided by convolutional sub-layers, thereby eliminating the need for class token and positional embeddings used by the ViTs. We also propose a new training method where we use two different optimizers during different phases of training and show that it improves the top-1 image classification accuracy across different architectures. CXV outperforms other architectures, token mixers (e.g. ConvMixer, FNet and MLP Mixer), transformer models (e.g. ViT, CCT, CvT and hybrid Xformers), and ResNets for image classification in scenarios with limited data and GPU resources (cores, RAM, power).

Tasks

GPU image-classification Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIFAR-10	Convolutional Performer for Vision (CPV)	Percentage correct	94.46	—	Unverified
CIFAR-100	Convolutional Linear Transformer for Vision (CLTV)	Percentage correct	60.11	—	Unverified
Tiny ImageNet Classification	Convolutional Nystromformer for Vision (CNV)	Validation Acc	49.56	—	Unverified

Convolutional Xformers for Vision

Code

Abstract

Tasks

Benchmark Results

Reproductions