SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 351400 of 817 papers

TitleStatusHype
Fotheidil: an Automatic Transcription System for the Irish Language0
Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection0
Preemptive Detection and Correction of Misaligned Actions in LLM Agents0
Information Elevation Network for Fast Online Action Detection0
Combination of Deep Speaker Embeddings for Diarisation0
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization0
Boundary-Recovering Network for Temporal Action Detection0
Intelligent Video Recording Optimization using Activity Detection for Surveillance Systems0
An Empirical Study on Activity Recognition in Long Surgical Videos0
MALT: Multi-scale Action Learning Transformer for Online Action Detection0
Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos0
Investigation of Speaker Representation for Target-Speaker Speech Processing0
Iterative Reweighted Algorithms for Joint User Identification and Channel Estimation in Spatially Correlated Massive MTC0
結合I-Vector 及深層神經網路之語者驗證系統 (Text-independent Speaker Verification using a Hybrid I-Vector/DNN Approach) [In Chinese]0
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems0
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling0
JOADAA: joint online action detection and action anticipation0
Joint Activity-Delay Detection and Channel Estimation for Asynchronous Massive Random Access0
Joint Activity-Delay Detection and Channel Estimation for Asynchronous Massive Random Access: A Free Probability Theory Approach0
Joint Activity Detection and Channel Estimation for Clustered Massive Machine Type Communications0
Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors0
Joint Activity Detection and Data Decoding in Massive Random Access via a Turbo Receiver0
Joint Activity Detection, Channel Estimation, and Data Decoding for Grant-free Massive Random Access0
Activity Detection from Wearable Electromyogram Sensors using Hidden Markov Model0
Long-Term Conversation Analysis: Privacy-Utility Trade-off under Noise and Reverberation0
Joint Estimation of Clustered User Activity and Correlated Channels with Unknown Covariance in mMTC0
An Effective-Efficient Approach for Dense Multi-Label Action Detection0
Jointly Detecting and Separating Singing Voice: A Multi-Task Approach0
Fine-grained Activities of People Worldwide0
Joint Speech Activity and Overlap Detection with Multi-Exit Architecture0
Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization0
Joint User Activity and Data Detection in Grant-Free NOMA using Generative Neural Networks0
A Bin Encoding Training of a Spiking Neural Network-based Voice Activity Detection0
Long-term Pre-training for Temporal Action Detection with Transformers0
Boundary Content Graph Neural Network for Temporal Action Proposal Generation0
Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection0
KIDS: kinematics-based (in)activity detection and segmentation in a sleep case study0
Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model0
LAP-Net: Adaptive Features Sampling via Learning Action Progression for Online Action Detection0
Learnable Acoustic Frontends in Bird Activity Detection0
Long Short-Term Relation Networks for Video Action Detection0
Finding Action Tubes with a Sparse-to-Dense Framework0
Bodily Behaviors in Social Interaction: Novel Annotations and State-of-the-Art Evaluation0
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection0
Learning Proximal Operator Methods for Massive Connectivity in IoT Networks0
Learning recurrent representations for hierarchical behavior modeling0
Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection0
Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation0
An Automated Machine Learning Framework for Surgical Suturing Action Detection under Class Imbalance0
Low-power, Continuous Remote Behavioral Localization with Event Cameras0
Show:102550
← PrevPage 8 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified