RANet: Ranking Attention Network for Fast Video Object Segmentation
Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, Ling Shao
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/Storife/RANetOfficialIn paperpytorch★ 0
- github.com/v1viswan/RANet_modificationspytorch★ 0
Abstract
Despite online learning (OL) techniques have boosted the performance of semi-supervised video object segmentation (VOS) methods, the huge time costs of OL greatly restrict their practicality. Matching based and propagation based methods run at a faster speed by avoiding OL techniques. However, they are limited by sub-optimal accuracy, due to mismatching and drifting problems. In this paper, we develop a real-time yet very accurate Ranking Attention Network (RANet) for VOS. Specifically, to integrate the insights of matching based and propagation based methods, we employ an encoder-decoder framework to learn pixel-level similarity and segmentation in an end-to-end manner. To better utilize the similarity maps, we propose a novel ranking attention module, which automatically ranks and selects these maps for fine-grained VOS performance. Experiments on DAVIS-16 and DAVIS-17 datasets show that our RANet achieves the best speed-accuracy trade-off, e.g., with 33 milliseconds per frame and J&F=85.5% on DAVIS-16. With OL, our RANet reaches J&F=87.1% on DAVIS-16, exceeding state-of-the-art VOS methods. The code can be found at https://github.com/Storife/RANet.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| DAVIS 2016 | RANet+ (online learning) | J&F | 87.1 | — | Unverified |
| DAVIS 2016 | RANet | J&F | 85.45 | — | Unverified |
| DAVIS-2017 (test-dev) | RANet | J&F | 55.4 | — | Unverified |
| DAVIS 2017 (val) | RANet | J&F | 65.7 | — | Unverified |
| DAVIS (no YouTube-VOS training) | RANet | D17 val (G) | 65.7 | — | Unverified |