Query2Label: A Simple Transformer Way to Multi-Label Classification

2021-07-22Code Available1· sign in to hype

Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu

Code Available — Be the first to reproduce this paper.

Code

github.com/SlongLiu/query2labels
OfficialIn paperpytorch★ 463
github.com/curt-tigges/query2label
pytorch★ 16
github.com/averyfallson/rmffn
pytorch★ 3

Abstract

This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embeddings as queries to probe and pool class-related features from a feature map computed by a vision backbone for subsequent binary classifications. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective, consistently outperforming all previous works on five multi-label classification data sets, including MS-COCO, PASCAL VOC, NUS-WIDE, and Visual Genome. Particularly, we establish 91.3\% mAP on MS-COCO. We hope its compact structure, simple implementation, and superior performance serve as a strong baseline for multi-label classification tasks and future studies. The code will be available soon at https://github.com/SlongLiu/query2labels.

Tasks

Classification Decoder Multi-Label Classification MUlTI-LABEL-ClASSIFICATION

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
MS-COCO	Q2L-TResL(ImageNet-21K pretraining, resolution 640)	mAP	90.3	—	Unverified
MS-COCO	Q2L-CvT(ImageNet-21K pretraining, resolution 384)	mAP	91.3	—	Unverified
MS-COCO	Q2L-SwinL(ImageNet-21K pretraining, resolution 384)	mAP	90.5	—	Unverified
MS-COCO	Q2L-R101(resolution 448)	mAP	84.9	—	Unverified
NUS-WIDE	Q2L-TResL(resoluition 448)	MAP	66.3	—	Unverified
NUS-WIDE	Q2L-CvT(resolution 384, ImageNet-21K pretrained)	MAP	70.1	—	Unverified
NUS-WIDE	Q2L-R101(resolution 448)	MAP	65	—	Unverified
PASCAL VOC 2007	Q2L-TResL(resolution 448)	mAP	96.1	—	Unverified
PASCAL VOC 2007	Q2L-CvT(ImageNet-21K pretrained, resolution 384)	mAP	97.3	—	Unverified
PASCAL VOC 2007	Q2L-TResL(ImageNet-21K pretrained, resolution 448)	mAP	96.9	—	Unverified
PASCAL VOC 2012	Q2L-TResL(448 resolution)	mAP	96.2	—	Unverified

Query2Label: A Simple Transformer Way to Multi-Label Classification

Code

Abstract

Tasks

Benchmark Results

Reproductions