SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 551600 of 817 papers

TitleStatusHype
Understanding Policy and Technical Aspects of AI-Enabled Smart Video Surveillance to Address Public Safety0
Unfolding Videos Dynamics via Taylor Expansion0
Unified Graph Structured Models for Video Understanding0
Union of Low-Rank Subspaces Detector0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
Unsupervised Action Proposal Ranking through Proposal Recombination0
Unsupervised Human Action Detection by Action Matching0
Untrimmed Action Anticipation0
Unveiling ECC Vulnerabilities: LSTM Networks for Operation Recognition in Side-Channel Attacks0
Unveiling the Power of Complex-Valued Transformers in Wireless Communications0
User Activity Detection and Channel Estimation of Spatially Correlated Channels via AMP in Massive MTC0
User Activity Detection for Irregular Repetition Slotted Aloha based MMTC0
User Activity Detection with Delay-Calibration for Asynchronous Massive Random Access0
User Adaptive Restoration for Incorrectly-Segmented Utterances in Spoken Dialogue Systems0
Using joint angles based on the international biomechanical standards for human action recognition and related tasks0
USTC-NELSLIP System Description for DIHARD-III Challenge0
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording0
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition0
VAST: A Corpus of Video Annotation for Speech Technologies0
Vessel Re-identification and Activity Detection in Thermal Domain for Maritime Surveillance0
Video Action Detection: Analysing Limitations and Challenges0
VideoCapsuleNet: A Simplified Network for Action Detection0
Video Event Detection by Exploiting Word Dependencies from Image Captions0
Video-guided Machine Translation with Spatial Hierarchical Attention Network0
vireoJD-MM at Activity Detection in Extended Videos0
Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets0
Voice Activity Detection using Temporal Characteristics of Autocorrelation Lag and Maximum Spectral Amplitude in Sub-bands0
VOXLINGUA107: A DATASET FOR SPOKEN LANGUAGE RECOGNITION0
VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention0
Watch Only Once: An End-to-End Video Action Detection Framework0
Weakly-Supervised Action Detection Guided by Audio Narration0
Weakly Supervised Gaussian Networks for Action Detection0
Whispy: Adapting STT Whisper Models to Real-Time Environments0
WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos0
You Ought to Look Around: Precise, Large Span Action Detection0
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection0
DASZL: Dynamic Action Signatures for Zero-shot Learning0
Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning0
A Proposed Artificial intelligence Model for Real-Time Human Action Localization and Tracking0
Multi-Stream Single Shot Spatial-Temporal Action Detection0
Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention0
Multi-task Self-Supervised Learning for Human Activity Detection0
Multi-Task Sub-Band Network For Deep Residual Echo Suppression0
Multi-timescale Event Detection in Nonintrusive Load Monitoring based on MDL Principle0
Multi-timescale Trajectory Prediction for Abnormal Human Activity Detection0
Neural Dialogue Context Online End-of-Turn Detection0
Representation Learning on Visual-Symbolic Graphs for Video Understanding0
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining0
NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge0
Nudge: Accelerating Overdue Pull Requests Towards Completion0
Show:102550
← PrevPage 12 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified