Grounded Situation Recognition with Transformers

2021-11-19Code Available1· sign in to hype

Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak

Code Available — Be the first to reproduce this paper.

Code

github.com/jhcho99/gsrtr
OfficialIn paperpytorch★ 27

Abstract

Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image. Inspired by the remarkable success of Transformers in vision tasks, we propose a GSR model based on a Transformer encoder-decoder architecture. The attention mechanism of our model enables accurate verb classification by capturing high-level semantic feature of an image effectively, and allows the model to flexibly deal with the complicated and image-dependent relations between entities for improved noun classification and localization. Our model is the first Transformer architecture for GSR, and achieves the state of the art in every evaluation metric on the SWiG benchmark. Our code is available at https://github.com/jhcho99/gsrtr .

Tasks

Decoder Grounded Situation Recognition Image Classification Object Detection Scene Understanding Situation Recognition Visual Grounding Visual Reasoning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
SWiG	GSRTR	Top-1 Verb	40.63	—	Unverified

Grounded Situation Recognition with Transformers

Code

Abstract

Tasks

Benchmark Results

Reproductions