Semi-Supervised Video Object Segmentation
The semi-supervised scenario assumes the user inputs a full mask of the object(s) of interest in the first frame of a video sequence. Methods have to produce the segmentation mask for that object(s) in the subsequent frames.
Papers
Showing 1–10 of 147 papers
All datasetsDAVIS 2017 (val)DAVIS 2016DAVIS-2017 (test-dev)YouTube-VOS 2018DAVIS (no YouTube-VOS training)YouTube-VOS 2019VOT2020MOSELong Video DatasetYouTubeDAVIS 2017BURST-test
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SwinB-AOTv2-L (MS) | J&F | 93 | — | Unverified |
| 2 | SwinB-AOST (L'=3, MS) | J&F | 93 | — | Unverified |
| 3 | SwinB-DeAOT-L | J&F | 92.9 | — | Unverified |
| 4 | XMem (MS) | J&F | 92.7 | — | Unverified |
| 5 | SwinB-AOTv2-L | J&F | 92.4 | — | Unverified |
| 6 | SwinB-AOST (L'=3) | J&F | 92.4 | — | Unverified |
| 7 | R50-DeAOT-L | J&F | 92.3 | — | Unverified |
| 8 | R50-AOST (L'=3) | J&F | 92.1 | — | Unverified |
| 9 | XMem (BL30K) | J&F | 92 | — | Unverified |
| 10 | DeAOT-L | J&F | 92 | — | Unverified |