SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 451500 of 817 papers

TitleStatusHype
Sports Video: Fine-Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2021Code0
Continuous Human Action Detection Based on Wearable Inertial Data0
User Activity Detection and Channel Estimation of Spatially Correlated Channels via AMP in Massive MTC0
Learning Proximal Operator Methods for Massive Connectivity in IoT Networks0
Reformulating Zero-shot Action Recognition for Multi-label Actions0
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual InformationCode0
Weakly-guided Self-supervised Pretraining for Temporal Activity DetectionCode0
User Activity Detection for Irregular Repetition Slotted Aloha based MMTC0
Access Delay Constrained Activity Detection in Massive Random Access0
whu-nercms at trecvid2021:instance search task0
Self-Denoising Neural Networks for Few Shot Learning0
CTRN: Class-Temporal Relational Network for Action Detection0
You Ought to Look Around: Precise, Large Span Action Detection0
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR0
PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition0
Deep Learning-based Action Detection in Untrimmed Videos: A Survey0
Information Elevation Network for Fast Online Action Detection0
The VVAD-LRS3 Dataset for Visual Voice Activity Detection0
The Stackelberg Equilibrium for One-sided Zero-sum Partially Observable Stochastic Games0
Learning to Discriminate Information for Online Action Detection: Analysis and Application0
Class Semantics-based Attention for Action Detection0
The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge0
Identity-aware Graph Memory Network for Action Detection0
Sparse Signal Processing for Massive Connectivity via Mixed-Integer Programming0
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection0
Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker0
Video-guided Machine Translation with Spatial Hierarchical Attention Network0
Fine-Grained Classroom Activity Detection from Audio with Neural NetworksCode0
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording0
Joint Activity Detection, Channel Estimation, and Data Decoding for Grant-free Massive Random Access0
Spatio-Temporal Context for Action Detection0
SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection0
Exploring Temporal Context and Human Movement Dynamics for Online Action Detection in Videos0
Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets0
Dealing with training and test segmentation mismatch: FBK@IWSLT20210
EML Online Speech Activity Detection for the Fearless Steps Challenge Phase-III0
Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection0
Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations0
Algorithm Unrolling for Massive Access via Deep Neural Network with Theoretical Guarantee0
MaCLR: Motion-aware Contrastive Learning of Representations for VideosCode0
JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection0
Relation Modeling in Spatio-Temporal Action Localization0
A Stronger Baseline for Ego-Centric Action Detection0
Joint Channel Estimation and Device Activity Detection in Heterogeneous Networks0
PLSM: A Parallelized Liquid State Machine for Unintentional Action DetectionCode0
Accelerating Coordinate Descent via Active Set Selection for Device Activity Detection for Multi-Cell Massive Random Access0
Joint Activity Detection and Data Decoding in Massive Random Access via a Turbo Receiver0
Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation0
Spatial Correlation Aware Compressed Sensing for User Activity Detection and Channel Estimation in Massive MTC0
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection0
Show:102550
← PrevPage 10 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified