RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

2020-10-01Code Available1· sign in to hype

Miriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, Xavier Giro-i-Nieto

Code Available — Be the first to reproduce this paper.

Code

github.com/miriambellver/refvos
OfficialIn paperpytorch★ 28
github.com/imatge-upc/refvos
pytorch★ 0

Abstract

The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. Our work argues that existing benchmarks used for this task are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the phrases in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, with the non-trivial REs annotated with seven RE semantic categories. We leverage this data to analyze the results of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for language-guided VOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.

Tasks

Image Segmentation Referring Expression Segmentation Segmentation Video Object Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
A2Dre test	RefVos	Overall IoU	47.5	—	Unverified
A2D Sentences	RefVOS	IoU overall	0.6	—	Unverified
A2D Sentences	RefVOS	IoU overall	0.67	—	Unverified
DAVIS 2017 (val)	RefVOS	J&F 1st frame	44.5	—	Unverified
DAVIS 2017 (val)	RefVOS	J&F 1st frame	45.1	—	Unverified
RefCOCO testA	RefVOS with BERT + MLM Loss	Overall IoU	49.73	—	Unverified
RefCOCO+ test B	RefVOS with BERT + MLM loss	Overall IoU	36.17	—	Unverified
RefCoCo val	RefVOS with BERT + MLM loss	Overall IoU	59.45	—	Unverified
RefCoCo val	RefVOS with BERT + MLM loss	Overall IoU	44.71	—	Unverified
RefCoCo val	RefVOS with BERT Pre-train	Overall IoU	58.65	—	Unverified

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions