Actor and Action Video Segmentation from a Sentence

2018-03-20CVPR 2018Code Available1· sign in to hype

Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek

Code Available — Be the first to reproduce this paper.

Code

github.com/JerryX1110/awesome-rvos
none★ 89

Abstract

This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment pairs that are outside of the actor and action vocabulary. We propose a fully-convolutional model for pixel-level actor and action segmentation using an encoder-decoder architecture optimized for video. To show the potential of actor and action video segmentation from a sentence, we extend two popular actor and action datasets with more than 7,500 natural language descriptions. Experiments demonstrate the quality of the sentence-guided segmentations, the generalization ability of our model, and its advantage for traditional actor and action segmentation compared to the state-of-the-art.

Tasks

Action Segmentation Decoder Referring Expression Segmentation Segmentation Sentence Video Segmentation Video Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
A2D Sentences	Gavriluyk el al. (Optical flow)	AP	0.22	—	Unverified
A2D Sentences	Gavriluyk el al.	AP	0.2	—	Unverified
J-HMDB	Gavrilyuk et al. (Optical flow)	AP	0.27	—	Unverified
J-HMDB	Gavrilyuk et al.	AP	0.23	—	Unverified

Actor and Action Video Segmentation from a Sentence

Code

Abstract

Tasks

Benchmark Results

Reproductions