SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 301350 of 817 papers

TitleStatusHype
Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices0
Spatio-Temporal Action Detection Under Large MotionCode0
A Circular Window-based Cascade Transformer for Online Action Detection0
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization0
Actor-identified Spatiotemporal Action Detection --- Detecting Who Is Doing What in VideosCode0
Enabling Weakly-Supervised Temporal Action Localization from On-Device Learning of the Video Stream0
Review on Action Recognition for Accident Detection in Smart City Transportation Systems0
Weakly Supervised Online Action Detection for Infant General MovementsCode0
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos0
Bodily Behaviors in Social Interaction: Novel Annotations and State-of-the-Art Evaluation0
Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic ActionsCode1
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection0
Spotting Temporally Precise, Fine-Grained Events in VideoCode1
Hierarchically Self-Supervised Transformer for Human Skeleton Representation LearningCode1
Zero-Shot Temporal Action Detection via Vision-Language PromptingCode1
Semi-Supervised Temporal Action Detection with Proposal-Free MaskingCode1
ReAct: Temporal Action Detection with Relational QueriesCode1
Proposal-Free Temporal Action Detection via Global Segmentation Mask LearningCode1
Online Target Speaker Voice Activity Detection for Speaker Diarization0
MM-ALT: A Multimodal Automatic Lyric Transcription SystemCode1
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vesselsCode1
Fine-grained Activities of People Worldwide0
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription0
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic DelayCode0
An AIoT-enabled Autonomous Dementia Monitoring System0
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency FilteringCode1
Pyramid Region-based Slot Attention Network for Temporal Action Proposal GenerationCode0
One-stage Action Detection Transformer0
Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection0
Context-aware Proposal Network for Temporal Action Detection0
Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios0
RIS Assisted Device Activity Detection with Statistical Channel State Information0
GateHUB: Gated History Unit with Background Suppression for Online Action Detection0
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
TadML: A fast temporal action detection with Mechanics-MLPCode0
Stargazer: A transformer-based driver action detection system for intelligent transportationCode1
Data-aided Active User Detection with a User Activity Extraction Network for Grant-free SCMA Systems0
Structured Attention Composition for Temporal Action LocalizationCode2
A Boosting Algorithm for Positive-Unlabeled Learning0
Double-Sided Information Aided Temporal-Correlated Massive Access0
ETAD: Training Action Detection End to End on a LaptopCode1
Weakly-Supervised Action Detection Guided by Audio Narration0
An Empirical Study on Activity Recognition in Long Surgical Videos0
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionCode1
Ultra-sensitive Flexible Sponge-Sensor Array for Muscle Activities Detection and Human Limb Motion Recognition0
RADNet: A Deep Neural Network Model for Robust Perception in Moving Autonomous Systems0
Estimation of Reliable Proposal Quality for Temporal Action DetectionCode0
ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection0
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally ActionsCode1
Video Action Detection: Analysing Limitations and Challenges0
Show:102550
← PrevPage 7 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified