Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition

2016-11-28CVPR 2017Unverified0· sign in to hype

Timur Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, Silvio Savarese

Unverified — Be the first to reproduce this paper.

Abstract

We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme. The temporal consistency is handled via a person-level matching Recurrent Neural Network. The complete model takes as input a sequence of frames and outputs detections along with the estimates of individual actions and collective activities. We demonstrate state-of-the-art performance of our algorithm on multiple publicly available benchmarks.

Tasks

Action Localization Action Recognition Activity Recognition Scene Understanding

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Volleyball	GTT (VGG19)	Accuracy	82.6	—	Unverified
Volleyball	SSU (GT)	Accuracy	81.8	—	Unverified

Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition

Abstract

Tasks

Benchmark Results

Reproductions