SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 351400 of 817 papers

TitleStatusHype
Automated speech tools for helping communities process restricted-access corpora for language revival efforts0
Anomalous Sound Detection Based on Machine Activity Detection0
CholecTriplet2021: A benchmark challenge for surgical action triplet recognitionCode1
E^2TAD: An Energy-Efficient Tracking-based Action DetectorCode1
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition0
An Empirical Study of End-to-End Temporal Action DetectionCode1
Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network0
Low-Latency Speech Separation Guided Diarization for Telephone ConversationsCode1
Gan-Based Joint Activity Detection and Channel Estimation For Grant-free Random AccessCode0
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video ModelsCode1
Deep Learning for Encrypted Traffic Classification and Unknown Data Detection0
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios0
ABN: Agent-Aware Boundary Networks for Temporal Action Proposal GenerationCode0
RCL: Recurrent Continuous Localization for Temporal Action Detection0
Context-LSTM: a robust classifier for video detection on UCF1010
Human Attention Detection Using AM-FM Representations0
PAMI-AD: An Activity Detector Exploiting Part-attention and Motion Information in Surveillance Videos0
End-to-End Semi-Supervised Learning for Video Action DetectionCode1
SegTAD: Precise Temporal Action Detection via Semantic Segmentation0
Colar: Effective and Efficient Online Action Detection by Consulting ExemplarsCode2
Random Access with Massive MIMO-OTFS in LEO Satellite Communications0
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition0
Active Privacy-Utility Trade-off Against Inference in Time-Series Data Sharing0
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge0
Untrimmed Action Anticipation0
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge0
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge0
HGCN: Harmonic gated compensation network for speech enhancementCode1
NAS-VAD: Neural Architecture Search for Voice Activity DetectionCode1
Continual Transformers: Redundancy-Free Attention for Online InferenceCode1
Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals0
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization0
Exploiting Temporal Side Information in Massive IoT ConnectivityCode1
Merry Go Round: Rotate a Frame and Fool a DNN0
Binary Image Skeletonization Using 2-Stage U-Net0
Two Stream Network for Stroke Detection in Table Tennis0
Spatio-Temporal CNN baseline method for the Sports Video Task of MediaEval 2021 benchmarkCode0
Sports Video: Fine-Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2021Code0
Low Resource Species Agnostic Bird Activity Detection0
SVIP: Sequence VerIfication for Procedures in VideosCode1
Continuous Human Action Detection Based on Wearable Inertial Data0
X-Vector based voice activity detection for multi-genre broadcast speech-to-textCode1
User Activity Detection and Channel Estimation of Spatially Correlated Channels via AMP in Massive MTC0
DCAN: Improving Temporal Action Detection via Dual Context AggregationCode1
MS-TCT: Multi-Scale Temporal ConvTransformer for Action DetectionCode1
Learning Proximal Operator Methods for Massive Connectivity in IoT Networks0
Reformulating Zero-shot Action Recognition for Multi-label Actions0
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual InformationCode0
Weakly-guided Self-supervised Pretraining for Temporal Activity DetectionCode0
User Activity Detection for Irregular Repetition Slotted Aloha based MMTC0
Show:102550
← PrevPage 8 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified