SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 201250 of 817 papers

TitleStatusHype
EUDAMU at SemEval-2017 Task 11: Action Ranking and Type Matching for End-User Development0
Cross-domain Voice Activity Detection with Self-Supervised Representations0
Cross modal video representations for weakly supervised active speaker localization0
Cross-modal Supervision for Learning Active Speaker Detection in Video0
CTRN: Class-Temporal Relational Network for Action Detection0
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model0
Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection0
Continuous Human Action Detection Based on Wearable Inertial Data0
Data-aided Active User Detection with a User Activity Extraction Network for Grant-free SCMA Systems0
Dataset for Real-World Human Action Detection Using FMCW mmWave Radar0
A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos0
ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection0
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection0
Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection0
Continual Low-Rank Scaled Dot-product Attention0
Deep Learning-Assisted Parallel Interference Cancellation for Grant-Free NOMA in Machine-Type Communication0
Deep Learning-based Action Detection in Untrimmed Videos: A Survey0
Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation0
Context Understanding in Computer Vision: A Survey0
Deep Learning for Asynchronous Massive Access with Data Frame Length Diversity0
Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos0
Deep Learning for Encrypted Traffic Classification and Unknown Data Detection0
A processing framework to access large quantities of whispered speech found in ASMR0
Detection of Object Throwing Behavior in Surveillance Videos0
Device Activity Detection and Channel Estimation for Millimeter-Wave Massive MIMO0
Device Detection and Channel Estimation in MTC with Correlated Activity Pattern0
Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection0
Application of Machine Learning Techniques in Human Activity Recognition0
Access Delay Constrained Activity Detection in Massive Random Access0
DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team0
Evaluation of real-time transcriptions using end-to-end ASR models0
Discovering Spatio-Temporal Action Tubes0
Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications0
Distributed Optimization for Massive Connectivity0
DOAD: Decoupled One Stage Action Detection Network0
Double-Sided Information Aided Temporal-Correlated Massive Access0
DT4ECG: A Dual-Task Learning Framework for ECG-Based Human Identity Recognition and Human Activity Detection0
Attention Filtering for Multi-person Spatiotemporal Action Detection on Deep Two-Stream CNN Architectures0
Dual DETRs for Multi-Label Temporal Action Detection0
Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding0
FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection0
Context-LSTM: a robust classifier for video detection on UCF1010
Application-Driven AI Paradigm for Hand-Held Action Detection0
Early Detection of In-Memory Malicious Activity based on Run-time Environmental Features0
Effective Abnormal Activity Detection on Multivariate Time Series Healthcare Data0
Efficient Action Detection in Untrimmed Videos via Multi-Task Learning0
ContextDet: Temporal Action Detection with Adaptive Context Aggregation0
A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation0
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization0
Context-aware Proposal Network for Temporal Action Detection0
Show:102550
← PrevPage 5 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified