SparseSwin: Swin Transformer with Sparse Transformer Block

2023-09-11Code Available1· sign in to hype

Krisna Pinasthika, Blessius Sheldo Putra Laksono, Riyandi Banovbi Putera Irsal, Syifa Hukma Shabiyya, Novanto Yudistira

Code Available — Be the first to reproduce this paper.

Code

github.com/krisnapinasthika/sparseswin
OfficialIn paperpytorch★ 16

Abstract

Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin capability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state of the art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and CIFAR100 datasets respectively. Despite its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance.

Tasks

image-classification Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIFAR-10	SparseSwin	Percentage correct	97.43	—	Unverified
CIFAR-100	SparseSwin	Percentage correct	85.35	—	Unverified
ImageNet-100 (TEMI Split)	SparseSwin with L2	Percentage correct	86.96	—	Unverified

SparseSwin: Swin Transformer with Sparse Transformer Block

Code

Abstract

Tasks

Benchmark Results

Reproductions