SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 101150 of 817 papers

TitleStatusHype
MMAD: Multi-label Micro-Action Detection in VideosCode1
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASRCode0
Micro-gesture Online Recognition using Learnable Query Points0
DyFADet: Dynamic Feature Aggregation for Temporal Action DetectionCode1
Automatic Speech Recognition for Hindi0
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical InvestigationCode0
Using joint angles based on the international biomechanical standards for human action recognition and related tasks0
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 20240
AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming0
Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness0
Vessel Re-identification and Activity Detection in Thermal Domain for Maritime Surveillance0
Deep Learning-Based Approach for User Activity Detection with Grant-Free Random Access in Cell-Free Massive MIMO0
An Effective-Efficient Approach for Dense Multi-Label Action Detection0
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender SegmentationCode1
Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access0
Object Aware Egocentric Online Action Detection0
Skeleton-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action DetectionCode0
MALT: Multi-scale Action Learning Transformer for Online Action Detection0
A Real-Time Voice Activity Detection Based On Lightweight Neural0
Open-Vocabulary Spatio-Temporal Action Detection0
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization0
No Time to Waste: Squeeze Time into Channel for Mobile Video UnderstandingCode1
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection0
Whispy: Adapting STT Whisper Models to Real-Time Environments0
Activity Detection for Massive Random Access using Covariance-based Matching Pursuit0
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label FeaturesCode0
FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method0
A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation0
Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection0
STMixer: A One-Stage Sparse Action Detector0
TIM: A Time Interval Machine for Audio-Visual Action RecognitionCode2
UniMD: Towards Unifying Moment Retrieval and Temporal Action DetectionCode2
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate ExpressionCode1
Action Detection via an Image Diffusion Process0
Dual DETRs for Multi-Label Temporal Action Detection0
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal CorruptionsCode1
Deep Learning-Assisted Parallel Interference Cancellation for Grant-Free NOMA in Machine-Type Communication0
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications0
Detection of Object Throwing Behavior in Surveillance Videos0
sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks0
High-speed Low-consumption sEMG-based Transient-state micro-Gesture Recognition0
Fast Low-parameter Video Activity Localization in Collaborative Learning Environments0
Joint Activity-Delay Detection and Channel Estimation for Asynchronous Massive Random Access: A Free Probability Theory Approach0
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection0
Device Activity Detection and Channel Estimation for Millimeter-Wave Massive MIMO0
A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model0
Joint User Detection and Localization in Near-Field Using Reconfigurable Intelligent Surfaces0
Online speaker diarization of meetings guided by speech separationCode1
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model0
Self-supervised New Activity Detection in Sensor-based Smart Environments0
Show:102550
← PrevPage 3 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified