Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

2012-12-01NeurIPS 2012Unverified0· sign in to hype

Du Tran, Junsong Yuan

Unverified — Be the first to reproduce this paper.

Abstract

Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because one needs to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efficient Max-Path search method, thus makes it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods.

Tasks

Action Localization Object Localization regression Spatio-Temporal Action Localization Temporal Action Localization

Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

Abstract

Tasks

Reproductions