SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 451500 of 817 papers

TitleStatusHype
Cross-modal Supervision for Learning Active Speaker Detection in Video0
CTRN: Class-Temporal Relational Network for Action Detection0
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model0
Data-aided Active User Detection with a User Activity Extraction Network for Grant-free SCMA Systems0
Dataset for Real-World Human Action Detection Using FMCW mmWave Radar0
Dealing with training and test segmentation mismatch: FBK@IWSLT20210
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection0
Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection0
Deep Learning-Assisted Parallel Interference Cancellation for Grant-Free NOMA in Machine-Type Communication0
Deep Learning-based Action Detection in Untrimmed Videos: A Survey0
Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation0
Spatial-Temporal Alignment Network for Action Recognition and Detection0
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation0
Spatio-Temporal Action Detection with Multi-Object Interaction0
Spatio-Temporal Action Localization in a Weakly Supervised Setting0
Spatio-temporal Action Recognition: A Survey0
Spatio-Temporal Context for Action Detection0
Spatio-Temporal Context Prompting for Zero-Shot Action Detection0
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection0
Spatiotemporal Deformable Part Models for Action Detection0
Spatiotemporal Event Graphs for Dynamic Scene Understanding0
Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features0
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios0
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization0
Speaker Independent Continuous Speech to Text Converter for Mobile Application0
Speech enhancement aided end-to-end multi-task learning for voice activity detection0
Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection0
SPIRE-SIES: A Spontaneous Indian English Speech Corpus0
SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection0
SRG: Snippet Relatedness-based Temporal Action Proposal Generator0
Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments0
Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector0
STMixer: A One-Stage Sparse Action Detector0
Supporting More Active Users for Massive Access via Data-assisted Activity Detection0
sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks0
SVVAD: Personal Voice Activity Detection for Speaker Verification0
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks0
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection0
TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition0
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription0
Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario0
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction0
Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker0
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization0
TCG CREST System Description for the Second DISPLACE Challenge0
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks0
Temporal Action Detection by Joint Identification-Verification0
Temporal Action Detection Model Compression by Progressive Block Drop0
Temporal Action Detection with Multi-level Supervision0
Temporal Action Localization by Structured Maximal Sums0
Show:102550
← PrevPage 10 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified