HSTforU: anomaly detection in aerial and ground-based videos with hierarchical spatio-temporal transformer for U-net
Viet-Tuan Le, Hulin Jin, Yong-Guk Kim
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/vt-le/HSTforUIn paperpytorch★ 40
Abstract
Anomaly detection is to identify abnormal events against normal ones within surveillance videos mainly collected in ground-based settings. Recently, the demand for processing drone-collected data is rapidly growing with the expanding range of drone applications. However, as most aerial videos collected by flying drones contain dynamic backgrounds and others, it is necessary to deal with their spatio-temporal features in detecting anomalies. This study presents a transformer-based video anomaly detection method whereby we investigate a challenging aerial dataset and three benchmark ground-based datasets. A multi-stage transformer is leveraged as an encoder to generate multi-scale feature maps, which are then conveyed to a hierarchical spatio-temporal transformer, that is linked to a decoder and used to capture spatial and temporal information by utilizing a joint attention mechanism. Extensive evaluations including several ablation studies suggest that this network outperforms the state-of-the-art methods. We expect the proposed transformer for U-net can find diverse applications in the video processing area. Code and pre-trained models are publicly available at https://github.com/vt-le/HSTforU.