MVT: Multi-view Vision Transformer for 3D Object Recognition

2021-10-25Code Available0· sign in to hype

Shuo Chen, Tan Yu, Ping Li

Code Available — Be the first to reproduce this paper.

Code

github.com/shanshuo/MVT
Officialpytorch★ 9
github.com/shanshuo/R2-MLP
pytorch★ 3

Abstract

Inspired by the great success achieved by CNN in image recognition, view-based methods applied CNNs to model the projected views for 3D object understanding and achieved excellent performance. Nevertheless, multi-view CNN models cannot model the communications between patches from different views, limiting its effectiveness in 3D object recognition. Inspired by the recent success gained by vision Transformer in image recognition, we propose a Multi-view Vision Transformer (MVT) for 3D object recognition. Since each patch feature in a Transformer block has a global reception field, it naturally achieves communications between patches from different views. Meanwhile, it takes much less inductive bias compared with its CNN counterparts. Considering both effectiveness and efficiency, we develop a global-local structure for our MVT. Our experiments on two public benchmarks, ModelNet40 and ModelNet10, demonstrate the competitive performance of our MVT.

Tasks

3D Object Recognition Inductive Bias Object Object Recognition

MVT: Multi-view Vision Transformer for 3D Object Recognition

Code

Abstract

Tasks

Reproductions