DirectPose: Direct End-to-End Multi-Person Pose Estimation
Zhi Tian, Hao Chen, Chunhua Shen
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/aim-uofa/adetpytorch★ 3,474
- github.com/aim-uofa/AdelaiDetpytorch★ 3,474
- github.com/IDEA-Research/UniPosepytorch★ 786
- github.com/idea-research/x-posepytorch★ 786
- github.com/Pxtri2156/AdelaiDet_v2pytorch★ 5
- github.com/blueardour/AdelaiDetpytorch★ 5
- github.com/zhaozhijie1997/Unifed-Lane-and-Traffic-Sign-detectionpytorch★ 4
- github.com/quangvy2703/ABCNet-ESRGAN-SRTEXTpytorch★ 4
- github.com/zhubinQAQ/Inspytorch★ 2
Abstract
We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose. Inspired by recent anchor-free object detectors, which directly regress the two corners of target bounding-boxes, the proposed framework directly predicts instance-aware keypoints for all the instances from a raw input image, eliminating the need for heuristic grouping in bottom-up methods or bounding-box detection and RoI operations in top-down ones. We also propose a novel Keypoint Alignment (KPAlign) mechanism, which overcomes the main difficulty: lack of the alignment between the convolutional features and predictions in this end-to-end framework. KPAlign improves the framework's performance by a large margin while still keeping the framework end-to-end trainable. With the only postprocessing non-maximum suppression (NMS), our proposed framework can detect multi-person keypoints with or without bounding-boxes in a single shot. Experiments demonstrate that the end-to-end paradigm can achieve competitive or better performance than previous strong baselines, in both bottom-up and top-down methods. We hope that our end-to-end approach can provide a new perspective for the human pose estimation task.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| COCO test-dev | DirectPose (ResNet-101) | AP | 63.3 | — | Unverified |