SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 651700 of 817 papers

TitleStatusHype
Recursive Binary Neural Network Learning Model with 2-bit/weight Storage Requirement0
Reformulating Zero-shot Action Recognition for Multi-label Actions0
Relation Modeling in Spatio-Temporal Action Localization0
Review on Action Recognition for Accident Detection in Smart City Transportation Systems0
Revisiting Few-shot Activity Detection with Class Similarity Control0
RIS Assisted Device Activity Detection with Statistical Channel State Information0
Risk Analysis and Prevention: LELIE, a Tool dedicated to Procedure and Requirement Authoring0
Zero-Shot Imitating Collaborative Manipulation Plans from YouTube Cooking Videos0
Robust Activity Detection for Massive Random Access0
Robust Learning-Based Sparse Recovery for Device Activity Detection in Grant-Free Random Access Cell-Free Massive MIMO: Enhancing Resilience to Impairments0
Robust Two-Stream Multi-Feature Network for Driver Drowsiness Detection0
SALAD: Self-Assessment Learning for Action Detection0
SCC: Semantic Context Cascade for Efficient Action Detection0
SegCodeNet: Color-Coded Segmentation Masks for Activity Detection from Wearable Cameras0
Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection0
SegTAD: Precise Temporal Action Detection via Semantic Segmentation0
Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification0
Self-Denoising Neural Networks for Few Shot Learning0
Self-Feedback DETR for Temporal Action Detection0
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions0
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction0
Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech0
Semi-supervised Acoustic Modelling for Five-lingual Code-switched ASR using Automatically-segmented Soap Opera Speech0
Sensing Framework Design and Performance Optimization with Action Detection for ISCC0
Sequence Block based Compressed Sensing Multiuser Detection for 5G0
Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation0
Siamese Neural Networks for Class Activity Detection0
Signed Latent Factors for Spamming Activity Detection0
Similarity R-C3D for Few-shot Temporal Activity Detection0
Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios0
Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments0
Skeleton Boxes: Solving skeleton based action detection with a single deep convolutional neural network0
SkeleTR: Towards Skeleton-based Action Recognition in the Wild0
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild0
Smart Black Box 2.0: Efficient High-bandwidth Driving Data Collection based on Video Anomalies0
Sparse Activity Discovery in Energy Constrained Multi-Cluster IoT Networks Using Group Testing0
Sparse Signal Processing for Massive Connectivity via Mixed-Integer Programming0
Spatial Correlation Aware Compressed Sensing for User Activity Detection and Channel Estimation in Massive MTC0
Spatial Morphing Kernel Regression For Feature Interpolation0
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video UnderstandingCode0
MaCLR: Motion-aware Contrastive Learning of Representations for VideosCode0
Weakly-guided Self-supervised Pretraining for Temporal Activity DetectionCode0
Modality Distillation with Multiple Stream Networks for Action RecognitionCode0
Action Sets: Weakly Supervised Action Segmentation without Ordering ConstraintsCode0
Am I Done? Predicting Action Progress in VideosCode0
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASRCode0
Semi-supervised Active Learning for Video Action DetectionCode0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
Weakly-supervised Visual Instrument-playing Action Detection in VideosCode0
Show:102550
← PrevPage 14 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified