ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

2020-05-15ECCV 2020Code Available1· sign in to hype

Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang

Code Available — Be the first to reproduce this paper.

Code

github.com/Jarr0d/ViTAA
OfficialIn paperpytorch★ 40
github.com/Jarr0d/Human-Parsing-Network
pytorch★ 17

Abstract

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions. While most of the current methods treat the task as a holistic visual and textual feature matching one, we approach it from an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions. We achieve success as well as the performance boosting by a robust feature learning that the referred identity can be accurately bundled by multiple attribute visual cues. To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into subspaces corresponding to attributes using a light auxiliary attribute segmentation computing branch. It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss. Upon that, we validate our ViTAA framework through extensive experiments on tasks of person search by natural language and by attribute-phrase queries, on which our system achieves state-of-the-art performances. Code will be publicly available upon publication.

Tasks

Attribute Contrastive Learning Person Search Text based Person Retrieval

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CUHK-PEDES	ViTAA	R@1	55.97	—	Unverified

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Code

Abstract

Tasks

Benchmark Results

Reproductions