SOTAVerified

Action Detection

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Papers

Showing 251300 of 817 papers

TitleStatusHype
MaCLR: Motion-aware Contrastive Learning of Representations for VideosCode0
A Convolutional Neural Network Smartphone App for Real-Time Voice Activity DetectionCode0
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal VideosCode0
Emotion Action Detection and Emotion Inference: the Task and DatasetCode0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
Learning to Discriminate Information for Online Action DetectionCode0
A Multi-Task Learning Approach for Human Activity Segmentation and Ergonomics Risk AssessmentCode0
Learning Spatio-Temporal Representation with Local and Global DiffusionCode0
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for ConversationsCode0
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language ContextsCode0
Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action DetectionCode0
Evaluation of Noise Reduction Methods for Sentence Recognition by Sinhala Speaking ListenersCode0
Identifying Visible Actions in Lifestyle VlogsCode0
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical InvestigationCode0
A Comprehensive Study on Temporal Modeling for Online Action DetectionCode0
Human Activity Recognition: A Spatio-temporal Image Encoding of 3D Skeleton Data for Online Action DetectionCode0
Incremental Tube Construction for Human Action DetectionCode0
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial ScenariosCode0
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention MechanismsCode0
Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose EstimationCode0
Handwashing Action Detection System for an Autonomous Social RobotCode0
Estimation of Reliable Proposal Quality for Temporal Action DetectionCode0
End-to-end Learning of Action Detection from Frame Glimpses in VideosCode0
Am I Done? Predicting Action Progress in VideosCode0
Fine-Grained Classroom Activity Detection from Audio with Neural NetworksCode0
FunASR: A Fundamental End-to-End Speech Recognition ToolkitCode0
Fine-grained Activity Recognition in Baseball VideosCode0
Gan-Based Joint Activity Detection and Channel Estimation For Grant-free Random AccessCode0
Graph Distillation for Action Detection with Privileged ModalitiesCode0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
Long-term Conversation Analysis: Exploring Utility and PrivacyCode0
EMO\&LY (EMOtion and AnomaLY) : A new corpus for anomaly detection in an audiovisual stream with emotional context.0
EML Online Speech Activity Detection for the Fearless Steps Challenge Phase-III0
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts0
Automatic Speech Recognition for Hindi0
A Hybrid Graph Network for Complex Activity Detection in Video0
Ego-Only: Egocentric Action Detection without Exocentric Transferring0
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization0
Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search0
Automated speech tools for helping communities process restricted-access corpora for language revival efforts0
ActionSpotter: Deep Reinforcement Learning Framework for Temporal Action Spotting in Videos0
Efficient Action Detection in Untrimmed Videos via Multi-Task Learning0
Effective Abnormal Activity Detection on Multivariate Time Series Healthcare Data0
A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments0
Early Detection of In-Memory Malicious Activity based on Run-time Environmental Features0
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation0
A Grammatical Compositional Model for Video Action Detection0
Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection0
Dual DETRs for Multi-Label Temporal Action Detection0
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion0
Show:102550
← PrevPage 6 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1STAR/LFrame-mAP 0.590.3Unverified
2SiAFrame-mAP 0.588.5Unverified
3YOWO + LFBFrame-mAP 0.587.3Unverified
4HITFrame-mAP 0.584.8Unverified
5HISAN (ResNet-101 + FPN)Video-mAP 0.282.3Unverified
6YOWOFrame-mAP 0.580.4Unverified
7Two-in-one Two StreamVideo-mAP 0.278.48Unverified
8MOCFrame-mAP 0.577.8Unverified
9Faster-RCNN + two-stream I3D convFrame-mAP 0.576.3Unverified
10Two-in-oneVideo-mAP 0.275.48Unverified
#ModelMetricClaimedVerifiedStatus
1SiAFrame-mAP 0.588.5Unverified
2HISAN (ResNet-101 + FPN)Video-mAP 0.287.59Unverified
3HITFrame-mAP 0.583.8Unverified
4HISAN (VGG-16)Frame-mAP 0.576.72Unverified
5DTSVideo-mAP 0.276.1Unverified
6YOWO + LFBFrame-mAP 0.575.7Unverified
7Two-in-one Two StreamVideo-mAP 0.574.74Unverified
8YOWOFrame-mAP 0.574.4Unverified
9MOCFrame-mAP 0.574Unverified
10Faster-RCNN + two-stream I3D convFrame-mAP 0.573.3Unverified
#ModelMetricClaimedVerifiedStatus
1TTMmAP28.79Unverified
2CTRNmAP27.8Unverified
3Coarse-Fine Networks (w/ self-supervised detection pretraining)mAP26.95Unverified
4UniMD+Sync. (RGB+Flow)mAP26.53Unverified
5PDAN (RGB+Flow)mAP26.5Unverified
6PATmAP26.5Unverified
7MS-TCT (RGB only)mAP25.4Unverified
83D ResNet-50 + super-events pretrained on AViDmAP25.2Unverified
9Coarse-Fine NetworksmAP25.1Unverified
10I3D + biGRU + VS-ST-MPNNmAP23.7Unverified
#ModelMetricClaimedVerifiedStatus
1MLADmAP51.5Unverified
2CTRNmAP51.2Unverified
3PDANmAP47.6Unverified
4TGMmAP46.4Unverified
5MS-TCT (RGB only)mAP43.1Unverified
6I3D + our super-eventmAP36.4Unverified
7Two-stream + LSTMmAP28.1Unverified
8Two-streammAP27.6Unverified
#ModelMetricClaimedVerifiedStatus
1Two-in-one Two StreamVideo-mAP 0.596.52Unverified
2DTSVideo-mAP 0.294.3Unverified
3Two-in-oneVideo-mAP 0.592.74Unverified
4T-CNNFrame-mAP 0.586.7Unverified
5MR-TS R-CNNFrame-mAP 0.584.52Unverified
6TS R-CNNFrame-mAP 0.582.3Unverified
7Action TubesFrame-mAP 0.568.1Unverified
#ModelMetricClaimedVerifiedStatus
1MAT (Ours) TransmAP71.6Unverified
2TadML-two streammAP59.7Unverified
3MAT (ours)mAP58.2Unverified
4TadML-rgbmAP53.46Unverified
#ModelMetricClaimedVerifiedStatus
1HITFrame-mAP 0.533.3Unverified
2SiAFrame-mAP 0.528.8Unverified
#ModelMetricClaimedVerifiedStatus
1MS-TCTFrame-mAP33.7Unverified
2PDANFrame-mAP32.7Unverified
#ModelMetricClaimedVerifiedStatus
1STCNNIoU0.14Unverified
2Two Stream NetworkIoU0.07Unverified
#ModelMetricClaimedVerifiedStatus
1STCNN-V2 (Vote decision)IoU0.52Unverified
2RGB and PRGBIoU0.35Unverified
#ModelMetricClaimedVerifiedStatus
1PATmAP44.6Unverified