SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 301350 of 817 papers

TitleStatusHype
M^33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding0
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios0
The Impact of Silence on Speech Anti-Spoofing0
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild0
JOADAA: joint online action detection and action anticipation0
Effective Abnormal Activity Detection on Multivariate Time Series Healthcare Data0
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms0
Self-Feedback DETR for Temporal Action Detection0
Progression-Guided Temporal Action Detection in VideosCode0
The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 20230
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for ConversationsCode0
PAT: Position-Aware Transformer for Dense Multi-Label Action Detection0
A Survey on Deep Learning-based Spatio-temporal Action Detection0
An enhanced system for the detection and active cancellation of snoring signals0
Human-to-Human Interaction Detection0
Long-term Conversation Analysis: Exploring Utility and PrivacyCode0
ShuttleSet: A Human-Annotated Stroke-Level Singles Dataset for Badminton Tactical AnalysisCode0
Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features0
Parallel Neurosymbolic Integration with Concordia0
SVVAD: Personal Voice Activity Detection for Speaker Verification0
A Multi-Modal Transformer Network for Action Detection0
Building Accurate Low Latency ASR for Streaming Voice Search0
Joint Activity-Delay Detection and Channel Estimation for Asynchronous Massive Random Access0
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction0
FunASR: A Fundamental End-to-End Speech Recognition Toolkit0
Deep Learning for Asynchronous Massive Access with Data Frame Length Diversity0
Joint Activity Detection and Channel Estimation for Clustered Massive Machine Type Communications0
MRSN: Multi-Relation Support Network for Video Action Detection0
End-to-End Spatio-Temporal Action Localisation with Video Transformers0
Cooperative Multi-Cell Massive Access with Temporally Correlated Activity0
Array Configuration-Agnostic Personal Voice Activity Detection Based on Spatial Coherence0
ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action Understanding0
Grant-free Massive Random Access with Retransmission: Receiver Optimization and Performance Analysis0
Boundary-Denoising for Video Activity LocalizationCode0
Improve Temporal Action Proposals using Hierarchical Context0
DOAD: Decoupled One Stage Action Detection Network0
Evaluation of Noise Reduction Methods for Sentence Recognition by Sinhala Speaking ListenersCode0
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection0
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action DetectionCode0
Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV0
End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations0
A processing framework to access large quantities of whispered speech found in ASMR0
Multi-Task Sub-Band Network For Deep Residual Echo Suppression0
Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads0
Open Set Action Recognition via Multi-Label Evidential Learning0
Learnable Frontends that do not Learn: Quantifying Sensitivity to Filterbank Initialisation0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
Context Understanding in Computer Vision: A Survey0
Understanding Policy and Technical Aspects of AI-Enabled Smart Video Surveillance to Address Public Safety0
Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional NetworksCode0
Show:102550
← PrevPage 7 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified