DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/IDEACVR/DINOOfficialIn paperpytorch★ 2,765
- github.com/IDEA-Research/Grounded-Segment-Anythingpytorch★ 17,472
- github.com/lucasjinreal/yolov7_d2pytorch★ 3,114
- github.com/idea-research/dinopytorch★ 2,765
- github.com/IDEA-Research/detrexpytorch★ 2,274
- github.com/alibaba/EasyCVpytorch★ 1,949
- github.com/IDEACVR/MaskDINOpytorch★ 1,505
- github.com/idea-research/maskdinopytorch★ 1,505
- github.com/NVlabs/FasterViTpytorch★ 911
- github.com/idea-research/dn-detrpytorch★ 604
Abstract
We present DINO (DETR with Improved deNoising anchOr boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves 49.4AP in 12 epochs and 51.3AP in 24 epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of +6.0AP and +2.7AP, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO val2017 (63.2AP) and test-dev (63.3AP). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at https://github.com/IDEACVR/DINO.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| COCO minival | DINO (Swin-L) | box AP | 63.2 | — | Unverified |
| COCO minival | DINO-5scale (24 epoch) | box AP | 51.3 | — | Unverified |
| COCO minival | DINO-5scale (36 epoch) | box AP | 51.2 | — | Unverified |
| COCO-O | DINO (Swin-L) | Average mAP | 42.1 | — | Unverified |
| COCO test-dev | DINO (Swin-L,multi-scale, TTA) | box mAP | 63.3 | — | Unverified |
| SA-Det-100k | DINO (ResNet50 1x VFL) | AP | 43.7 | — | Unverified |