SOTAVerified

Action Recognition In Videos

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Papers

Showing 51100 of 124 papers

TitleStatusHype
Co-training Transformer with Videos and Images Improves Action Recognition0
Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing0
Class incremental learning for video action classification0
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition0
Video Transformer NetworkCode0
Temporal Difference Networks for Action Recognition0
Towards Improving Spatiotemporal Action Recognition in VideosCode0
Developing Motion Code Embedding for Action Recognition in Videos0
Pose And Joint-Aware Action RecognitionCode0
Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes0
Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition0
Self-Supervised MultiModal Versatile NetworksCode0
Dynamic Sampling Networks for Efficient Action Recognition in Videos0
Learn to cycle: Time-consistent feature discovery for action recognitionCode0
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View0
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition0
An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos0
Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues0
Gating Revisited: Deep Multi-layer RNNs That Can Be TrainedCode0
Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition0
MMTM: Multimodal Transfer Module for CNN FusionCode0
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action LocalizationCode0
Zero-Shot Action Recognition in Videos: A Survey0
Discriminative Video Representation Learning Using Support Vector Classifiers0
STM: SpatioTemporal and Motion Encoding for Action Recognition0
Collaborative Spatiotemporal Feature Learning for Video Action RecognitionCode0
What Makes Training Multi-Modal Classification Networks Hard?Code0
Learning Video Representations from Correspondence ProposalsCode0
Where and when to look? Spatial-temporal attention for action recognition in videos0
Out-of-Distribution Detection for Generalized Zero-Shot Action RecognitionCode0
Resource Efficient 3D Convolutional Neural NetworksCode0
Robust Real-Time Violence Detection in Video Using CNN And LSTMCode0
Collaborative Spatio-temporal Feature Learning for Video Action RecognitionCode0
Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision0
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition0
Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks0
Coupled Recurrent Network (CRN)0
Evolving Space-Time Neural Architectures for Videos0
Representation Flow for Action RecognitionCode0
Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos0
Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos0
Motion Feature Network: Fixed Motion Filter for Action Recognition0
Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks0
Pose-Based Two-Stream Relational Networks for Action Recognition in Videos0
DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding0
Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition0
Video Representation Learning Using Discriminative Pooling0
2D/3D Pose Estimation and Action Recognition using Multitask Deep LearningCode0
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition0
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action RecognitionCode0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CPNet Res34, 5 CPVal96.7Unverified
2STM (Resnet-50, 16 frames)Val96.7Unverified
3MFNetVal96.68Unverified
4DINVal95.31Unverified
5MultiScale TRNVal95.31Unverified
6convSTARVal92.7Unverified
73D-SqueezeNetVal90.77Unverified
83D-ShuffleNetV2 0.25xVal86.91Unverified
93D-MobileNetV2 0.2xVal86.43Unverified
#ModelMetricClaimedVerifiedStatus
1DSCNet (RGB + Pose)X-Sub97.4Unverified
2MMNetX-Sub97.4Unverified
3EPAM-NetX-Sub96.2Unverified
4DVANet (RGB only)X-Sub95.8Unverified
5TSMFX-Sub95.8Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)3-fold Accuracy96.2Unverified
23D-SqueezeNet3-fold Accuracy74.94Unverified
33D-ShuffleNetV2 0.25x3-fold Accuracy56.52Unverified
43D-MobileNetV2 0.2x3-fold Accuracy55.56Unverified
5Baseline UCF1013-fold Accuracy43.9Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top-1 Accuracy64.2Unverified
2CPNet Res34, 5 CPTop-1 Accuracy57.65Unverified
32-Stream TRNTop-1 Accuracy55.52Unverified
4DINTop-1 Accuracy34.11Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy86.5Unverified
2ActionCLIP (ViT-B/16)Top-1 Accuracy83.8Unverified
3Frozen Backbone, SwinV2-G-ext22K (Video-Swin)Top-1 Accuracy81.7Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)20.2Unverified
2VideoMAE V2mAP (Val)18.24Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)49.2Unverified
2OTAM[3]++Top-1 Accuracy(5-Way-1-Shot)42.8Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)39.8Unverified
2CMN[35]Top-1 Accuracy(5-Way-1-Shot)36.2Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendVideo hit@174.8Unverified
2LSTM +Pretrained on YT-8MVideo hit@165.7Unverified
#ModelMetricClaimedVerifiedStatus
1Single-stream R-C3D (two-way buffer)mAP@0.154.5Unverified
2Single-stream R-C3D (one-way buffer)mAP@0.151.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM + Pretrained on YT-8MmAP75.6Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)19.2Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)Average accuracy of 3 splits72.2Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy87.8Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendClip Hit@149.7Unverified
#ModelMetricClaimedVerifiedStatus
12D-3D-Softargmax (RGB only)Accuracy (CS)85.5Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top 1 Accuracy50.7Unverified