BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

2022-11-18CVPR 2023Code Available4· sign in to hype

Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/fundamentalvision/BEVFormer
pytorch★ 4,376
github.com/opengvlab/internimage
pytorch★ 2,803

Abstract

We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. Existing state-of-the-art BEV detectors are often tied to certain depth pre-trained backbones like VoVNet, hindering the synergy between booming image backbones and BEV detectors. To address this limitation, we prioritize easing the optimization of BEV detectors by introducing perspective space supervision. To this end, we propose a two-stage BEV detector, where proposals from the perspective head are fed into the bird's-eye-view head for final predictions. To evaluate the effectiveness of our model, we conduct extensive ablation studies focusing on the form of supervision and the generality of the proposed detector. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset. The code shall be released soon.

Tasks

3D Object Detection

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
nuScenes Camera Only	BEVFormer v2 (InternImage-XL)	NDS	63.4	—	Unverified
Rope3D	BEVFormer	AP@0.7	24.64	—	Unverified

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Code

Abstract

Tasks

Benchmark Results

Reproductions