SOTAVerified

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

2022-08-01Unverified0· sign in to hype

Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Video Object Segmentation (VOS) is fundamental to video understanding. Transformer-based methods show significant performance improvement on semi-supervised VOS. However, existing work faces challenges segmenting visually similar objects in close proximity of each other. In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-supervised VOS. It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries. This calibrated optical flow is then employed in our novel bilateral attention, which computes the correspondence between the query and reference frames in the neighboring bilateral space considering both motion and appearance. Extensive experiments validate the effectiveness of BATMAN architecture by outperforming all existing state-of-the-art on all four popular VOS benchmarks: Youtube-VOS 2019 (85.0%), Youtube-VOS 2018 (85.3%), DAVIS 2017Val/Testdev (86.2%/82.2%), and DAVIS 2016 (92.5%).

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
DAVIS 2016RMN (val)J&F88.8Unverified
DAVIS 2016STM (val)F-Score89.9Unverified
DAVIS 2016BATMAN (val)J&F92.5Unverified
DAVIS 2016STCN (val)J&F91.6Unverified
DAVIS 2016AOT (val)J&F91.1Unverified
DAVIS 2016LCM (val)J&F90.7Unverified
DAVIS 2016RPCMVOS (val)J&F90.6Unverified
DAVIS 2016KMN (val)J&F90.5Unverified
DAVIS 2016TransVOS (val)J&F90.5Unverified
DAVIS 2016CFBI+ (val)J&F89.9Unverified
DAVIS 2016CFBI (val)J&F89.4Unverified
DAVIS-2017 (test-dev)RMNJaccard71.9Unverified
DAVIS-2017 (test-dev)CFBIJaccard71.4Unverified
DAVIS-2017 (test-dev)BATMANJaccard78.4Unverified
DAVIS-2017 (test-dev)LCMJaccard74.4Unverified
DAVIS-2017 (test-dev)KMNJaccard74.1Unverified
DAVIS-2017 (test-dev)TransVOSJaccard73Unverified
DAVIS-2017 (test-dev)STCNJaccard72.7Unverified
DAVIS-2017 (test-dev)CFBI+Jaccard71.6Unverified
DAVIS 2017 (val)RMNMean Jaccard & F-Measure83.5Unverified
DAVIS 2017 (val)BATMANMean Jaccard & F-Measure86.2Unverified
DAVIS 2017 (val)STCNMean Jaccard & F-Measure85.4Unverified
DAVIS 2017 (val)AOTMean Jaccard & F-Measure84.9Unverified
DAVIS 2017 (val)TransVOSMean Jaccard & F-Measure83.9Unverified
DAVIS 2017 (val)RPCMVOSMean Jaccard & F-Measure83.7Unverified
DAVIS 2017 (val)CFBI+Mean Jaccard & F-Measure82.9Unverified
DAVIS 2017 (val)KMNMean Jaccard & F-Measure82.8Unverified
DAVIS 2017 (val)SSTMean Jaccard & F-Measure82.5Unverified
DAVIS 2017 (val)CFBIMean Jaccard & F-Measure81.9Unverified
DAVIS 2017 (val)LWLMean Jaccard & F-Measure81.6Unverified
DAVIS 2017 (val)AFB-URRMean Jaccard & F-Measure74.6Unverified
DAVIS 2017 (val)LCMF-measure86.5Unverified
DAVIS 2017 (val)STMF-measure84.3Unverified
YouTube-VOS 2018RMNJaccard (Seen)82.1Unverified
YouTube-VOS 2018SSTMean Jaccard & F-Measure81.7Unverified
YouTube-VOS 2018LWLMean Jaccard & F-Measure81.5Unverified
YouTube-VOS 2018KMNMean Jaccard & F-Measure81.4Unverified
YouTube-VOS 2018AFB-URRMean Jaccard & F-Measure79.6Unverified
YouTube-VOS 2018STMMean Jaccard & F-Measure79.4Unverified
YouTube-VOS 2018CFBIJaccard (Seen)81.1Unverified
YouTube-VOS 2018AOTMean Jaccard & F-Measure84.1Unverified
YouTube-VOS 2018RPCMVOSMean Jaccard & F-Measure84Unverified
YouTube-VOS 2018STCNMean Jaccard & F-Measure83Unverified
YouTube-VOS 2018CFBI+Mean Jaccard & F-Measure82.8Unverified
YouTube-VOS 2018LCMMean Jaccard & F-Measure82Unverified
YouTube-VOS 2018TransVOSMean Jaccard & F-Measure81.8Unverified
YouTube-VOS 2019BATMANMean Jaccard & F-Measure85Unverified
YouTube-VOS 2019CFBIMean Jaccard & F-Measure81Unverified

Reproductions