SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 51100 of 817 papers

TitleStatusHype
WiFi CSI Based Temporal Activity Detection via Dual Pyramid NetworkCode1
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language ContextsCode0
USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature DecorrelationCode1
Comparative Analysis of Deep Learning Approaches for Harmful Brain Activity Detection Using EEG0
Stable Mean Teacher for Semi-supervised Video Action DetectionCode0
Asynchronous Random Access in Massive MIMO Systems Facilitated by the Delay-Angle Domain0
Continual Low-Rank Scaled Dot-product Attention0
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and AssessmentCode0
Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation0
Exploiting VLM Localizability and Semantics for Open Vocabulary Action DetectionCode1
A Flexible Framework for Grant-Free Random Access in Cell-Free Massive MIMO Systems0
Transferable Adversarial Attacks against ASR0
On the Detection of Non-Cooperative RISs: Scan B-Testing via Deep Support Vector Data Description0
User Activity Detection with Delay-Calibration for Asynchronous Massive Random Access0
Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization0
Intelligent Video Recording Optimization using Activity Detection for Surveillance Systems0
On Occlusions in Video Action Detection: Benchmark Datasets And Training RecipesCode0
ContextDet: Temporal Action Detection with Adaptive Context Aggregation0
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection0
A Framework for Adapting Human-Robot Interaction to Diverse User GroupsCode0
Investigation of Speaker Representation for Target-Speaker Speech Processing0
Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action DetectionCode0
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts0
Query matching for spatio-temporal action detection with query-based object detector0
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks0
Raising the Bar(ometer): Identifying a User's Stair and Lift Usage Through Wearable Sensor Data Analysis0
Moshi: a speech-text foundation model for real-time dialogueCode9
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses0
TCG CREST System Description for the Second DISPLACE Challenge0
Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action DetectionCode0
A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities0
NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge0
Evaluation of real-time transcriptions using end-to-end ASR models0
Introducing Gating and Context into Temporal Action Detection0
Unfolding Videos Dynamics via Taylor Expansion0
Towards Student Actions in Classroom Scenes: New Dataset and BaselineCode1
Prediction-Feedback DETR for Temporal Action Detection0
Spatio-Temporal Context Prompting for Zero-Shot Action Detection0
Temporal Divide-and-Conquer Anomaly Actions Localization in Semi-Supervised Videos with Hierarchical Transformer0
Long-term Pre-training for Temporal Action Detection with Transformers0
Boundary-Recovering Network for Temporal Action Detection0
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling0
Blind User Activity Detection for Grant-Free Random Access in Cell-Free mMIMO Networks0
YOWOv3: An Efficient and Generalized Framework for Human Action Detection and RecognitionCode2
Long-Term Conversation Analysis: Privacy-Utility Trade-off under Noise and Reverberation0
Classification Matters: Improving Video Action Detection with Class-Specific Attention0
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal VideosCode0
Harnessing Temporal Causality for Advanced Temporal Action DetectionCode3
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming VideosCode1
Preemptive Detection and Correction of Misaligned Actions in LLM Agents0
Show:102550
← PrevPage 2 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified