SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 451500 of 817 papers

TitleStatusHype
Temporal Action Detection by Joint Identification-Verification0
Temporal Action Detection Model Compression by Progressive Block Drop0
Temporal Action Detection with Multi-level Supervision0
Temporal Action Localization by Structured Maximal Sums0
Temporal Divide-and-Conquer Anomaly Actions Localization in Semi-Supervised Videos with Hierarchical Transformer0
Spatio-Temporal Event Segmentation and Localization for Wildlife Extended Videos0
Temporal-Needle: A view and appearance invariant video descriptor0
Temporal Structure Mining for Weakly Supervised Action Detection0
Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection0
Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations0
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos0
The AFRL IWSLT 2020 Systems: Work-From-Home Edition0
The Cohort and Speechify Libraries for Rapid Construction of Speech Enabled Applications for Android0
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge0
The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 20220
The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge0
The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 20230
The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge0
The Impact of Silence on Speech Anti-Spoofing0
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge0
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 20220
The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description0
The RATS Collection: Supporting HLT Research with Degraded Audio Data0
The SAFE-T Corpus: A New Resource for Simulated Public Safety Communications0
The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods0
The "Sound of Silence" in EEG -- Cognitive voice activity detection0
The Speed Submission to DIHARD II: Contributions & Lessons Learned0
The Stackelberg Equilibrium for One-sided Zero-sum Partially Observable Stochastic Games0
The Use of Video Captioning for Fostering Physical Activity0
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge0
The VVAD-LRS3 Dataset for Visual Voice Activity Detection0
"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)0
Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations0
Time and Frequency Network for Human Action Detection in Videos0
Token Turing Machines0
TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal0
Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition0
Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations0
Towards More Practical Group Activity Detection: A New Benchmark and Model0
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM0
Trajectory-User Linking Is Easier Than You Think0
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR0
Transferable Adversarial Attacks against ASR0
Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection0
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval0
Tri-axial Self-Attention for Concurrent Activity Recognition0
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge0
Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection0
Two Stream Network for Stroke Detection in Table Tennis0
Two-Stream Region Convolutional 3D Network for Temporal Activity Detection0
Show:102550
← PrevPage 10 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified