SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 151200 of 817 papers

TitleStatusHype
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video ClassificationCode0
Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation ProtocolCode0
Review of Action Recognition and Detection MethodsCode0
Action Sets: Weakly Supervised Action Segmentation without Ordering ConstraintsCode0
RespVAD: Voice Activity Detection via Video-Extracted Respiration PatternsCode0
SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action LocalizationCode0
A Framework for Adapting Human-Robot Interaction to Diverse User GroupsCode0
Refining Action Boundaries for One-stage DetectionCode0
Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNNCode0
A flexible model for training action localization with varying levels of supervisionCode0
RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical FlowCode0
Skeleton-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action DetectionCode0
Pyramid Region-based Slot Attention Network for Temporal Action Proposal GenerationCode0
R-C3D: Region Convolutional 3D Network for Temporal Activity DetectionCode0
Scaling Open-Vocabulary Action DetectionCode0
Progression-Guided Temporal Action Detection in VideosCode0
A Stronger Baseline for Ego-Centric Action DetectionCode0
Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO SystemCode0
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic DelayCode0
PLSM: A Parallelized Liquid State Machine for Unintentional Action DetectionCode0
ACDnet: An action detection network for real-time edge computing based on flow-guided feature approximation and memory aggregationCode0
A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement LearningCode0
Personal VAD: Speaker-Conditioned Voice Activity DetectionCode0
Protest Activity Detection and Perceived Violence Estimation from Social Media ImagesCode0
Optimizing Large Language Models for ESG Activity Detection in Financial TextsCode0
Argus: Efficient Activity Detection System for Extended Video AnalysisCode0
Multi-Stage Speaker Diarization for Noisy ClassroomsCode0
On Occlusions in Video Action Detection: Benchmark Datasets And Training RecipesCode0
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label FeaturesCode0
Online Human Action Detection using Joint Classification-Regression Recurrent Neural NetworksCode0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
Online Spatiotemporal Action Detection and Prediction via Causal RepresentationsCode0
Personalized Activity Recognition with Deep Triplet EmbeddingsCode0
Weakly-guided Self-supervised Pretraining for Temporal Activity DetectionCode0
MaCLR: Motion-aware Contrastive Learning of Representations for VideosCode0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
Actor-identified Spatiotemporal Action Detection --- Detecting Who Is Doing What in VideosCode0
SoccerDB: A Large-Scale Database for Comprehensive Video UnderstandingCode0
Modality Distillation with Multiple Stream Networks for Action RecognitionCode0
Long-term Conversation Analysis: Exploring Utility and PrivacyCode0
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal VideosCode0
Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action DetectionCode0
Learning to Anonymize Faces for Privacy Preserving Action DetectionCode0
Learning to Discriminate Information for Online Action DetectionCode0
Actor Conditioned Attention Maps for Video Action DetectionCode0
Coarse-Fine Networks for Temporal Activity Detection in VideosCode0
A Pursuit of Temporal Accuracy in General Activity DetectionCode0
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for ConversationsCode0
Actor-Centric Relation NetworkCode0
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language ContextsCode0
Show:102550
← PrevPage 4 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10MLAD (RGB + Flow)mAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified