SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 351400 of 817 papers

TitleStatusHype
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention MechanismsCode0
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022Code0
The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description0
Deep learning-based approaches for human motion decoding in smart walkers for rehabilitation0
KIDS: kinematics-based (in)activity detection and segmentation in a sleep case study0
Ego-Only: Egocentric Action Detection without Exocentric Transferring0
SkeleTR: Towards Skeleton-based Action Recognition in the Wild0
Hybrid Active Learning via Deep Clustering for Video Action Detection0
Movement Enhancement toward Multi-Scale Video Feature Representation for Temporal Action Detection0
Activity Detection for Grant-Free NOMA in Massive IoT Networks0
Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features0
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks0
Trajectory-User Linking Is Easier Than You Think0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
BC-VAD: A Robust Bone Conduction Voice Activity Detection0
Proximal Gradient-Based Unfolding for Massive Random Access in IoT Networks0
Joint Estimation of Clustered User Activity and Correlated Channels with Unknown Covariance in mMTC0
Multi-timescale Event Detection in Nonintrusive Load Monitoring based on MDL Principle0
On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches0
Token Turing Machines0
Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection0
OFDM-Based Massive Connectivity for LEO Satellite Internet of Things0
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition0
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction0
Handwashing Action Detection System for an Autonomous Social RobotCode0
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge0
Refining Action Boundaries for One-stage DetectionCode0
mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors0
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization0
Application-Driven AI Paradigm for Hand-Held Action Detection0
The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 20220
Learnable Acoustic Frontends in Bird Activity Detection0
Signed Latent Factors for Spamming Activity Detection0
RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical FlowCode0
Joint Speech Activity and Overlap Detection with Multi-Exit Architecture0
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 20220
Cross-domain Voice Activity Detection with Self-Supervised Representations0
GIST-AiTeR System for the Diarization Task of the 2022 VoxCeleb Speaker Recognition Challenge0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices0
Spatio-Temporal Action Detection Under Large MotionCode0
A Circular Window-based Cascade Transformer for Online Action Detection0
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization0
Actor-identified Spatiotemporal Action Detection --- Detecting Who Is Doing What in VideosCode0
Enabling Weakly-Supervised Temporal Action Localization from On-Device Learning of the Video Stream0
Review on Action Recognition for Accident Detection in Smart City Transportation Systems0
Weakly Supervised Online Action Detection for Infant General MovementsCode0
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos0
Bodily Behaviors in Social Interaction: Novel Annotations and State-of-the-Art Evaluation0
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection0
Show:102550
← PrevPage 8 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified