Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

2021-11-24CVPR 2022Code Available1· sign in to hype

Xiaoxue Chen, Tianyu Liu, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

Code Available — Be the first to reproduce this paper.

Code

github.com/open-air-sun/cerberus
OfficialIn paperpytorch★ 63

Abstract

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses the aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of weakly supervised learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models can be accessed at https://github.com/OPEN-AIR-SUN/Cerberus.

Tasks

Attribute Scene Understanding Semantic Segmentation Weakly-supervised Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
NYU-Depth V2	Cerberus	Mean IoU	50.4	—	Unverified

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Code

Abstract

Tasks

Benchmark Results

Reproductions