SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 601650 of 817 papers

TitleStatusHype
The Instantaneous Accuracy: a Novel Metric for the Problem of Online Human Behaviour Recognition in Untrimmed VideosCode0
Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation0
A Novel Online Action Detection Framework from Untrimmed Video Streams0
ZSTAD: Zero-Shot Temporal Activity Detection0
Cross modal video representations for weakly supervised active speaker localization0
Argus: Efficient Activity Detection System for Extended Video Analysis0
DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team0
Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose EstimationCode0
Human Activity Recognition: A Spatio-temporal Image Encoding of 3D Skeleton Data for Online Action DetectionCode0
3D ResNet with Ranking Loss Function for Abnormal Activity Detection in Videos0
End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection0
Faster Activity and Data Detection in Massive Random Access: A Multi-armed Bandit Approach0
A Comprehensive Study on Temporal Modeling for Online Action DetectionCode0
Personalized Activity Recognition with Deep Triplet EmbeddingsCode0
End-Point Detection with State Transition Model based on Chunk-Wise Classification0
Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action RecognitionCode0
SoccerDB: A Large-Scale Database for Comprehensive Video UnderstandingCode0
Learning to Discriminate Information for Online Action DetectionCode0
Video action detection by learning graph-based spatio-temporal interactionsCode0
DASZL: Dynamic Action Signatures for Zero-shot Learning0
SRG: Snippet Relatedness-based Temporal Action Proposal Generator0
Zero-Shot Imitating Collaborative Manipulation Plans from YouTube Cooking Videos0
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action LocalizationCode0
Intelligent Reflecting Surface for Massive Device Connectivity: Joint Activity Detection and Channel Estimation0
A Proposed Artificial intelligence Model for Real-Time Human Action Localization and Tracking0
The Speed Submission to DIHARD II: Contributions & Lessons Learned0
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video UnderstandingCode0
A Bin Encoding Training of a Spiking Neural Network-based Voice Activity Detection0
Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection0
Multimodal Learning For Classroom Activity Detection0
AFO-TAD: Anchor-free One-Stage Detector for Temporal Action Detection0
Learning Temporal Action Proposals With Fewer Labels0
Temporal Structure Mining for Weakly Supervised Action Detection0
Hierarchical Self-Attention Network for Action Localization in Videos0
Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification0
Computer-Aided Automated Detection of Gene-Controlled Social Actions of Drosophila0
Multi-Stream Single Shot Spatial-Temporal Action Detection0
Multi-timescale Trajectory Prediction for Abnormal Human Activity Detection0
Personal VAD: Speaker-Conditioned Voice Activity DetectionCode0
Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization0
Multi-task Self-Supervised Learning for Human Activity Detection0
A Novel Approach for Robust Multi Human Action Recognition and Summarization based on 3D Convolutional Neural Networks0
Attention Filtering for Multi-person Spatiotemporal Action Detection on Deep Two-Stream CNN Architectures0
An end-to-end (deep) neural network applied to raw EEG, fNIRs and body motion data for data fusion and BCI classification task without any pre-/post-processing0
Deformable Tube Network for Action Detection in Videos0
An Acoustic Emission Activity Detection Method based on Short-Term Waveform Features: Application to Metallic Components under Uniaxial Tensile Test0
vireoJD-MM at Activity Detection in Extended Videos0
The Second DIHARD Diarization Challenge: Dataset, task, and baselinesCode0
Accelerating temporal action proposal generation via high performance computing0
Learning Spatio-Temporal Representation with Local and Global DiffusionCode0
Show:102550
← PrevPage 13 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified