SOTAVerified

Action Recognition In Videos

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Papers

Showing 51100 of 124 papers

TitleStatusHype
Region-based Non-local Operation for Video ClassificationCode1
IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in VideosCode1
Self-Supervised MultiModal Versatile NetworksCode0
Dynamic Sampling Networks for Efficient Action Recognition in Videos0
Unsupervised Learning of Video Representations via Dense Trajectory ClusteringCode1
Learn to cycle: Time-consistent feature discovery for action recognitionCode0
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View0
TEA: Temporal Excitation and Aggregation for Action RecognitionCode1
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition0
An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos0
Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues0
Gating Revisited: Deep Multi-layer RNNs That Can Be TrainedCode0
Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition0
MMTM: Multimodal Transfer Module for CNN FusionCode0
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action LocalizationCode0
Zero-Shot Action Recognition in Videos: A Survey0
Discriminative Video Representation Learning Using Support Vector Classifiers0
STM: SpatioTemporal and Motion Encoding for Action Recognition0
Collaborative Spatiotemporal Feature Learning for Video Action RecognitionCode0
What Makes Training Multi-Modal Classification Networks Hard?Code0
Learning Video Representations from Correspondence ProposalsCode0
Where and when to look? Spatial-temporal attention for action recognition in videos0
Out-of-Distribution Detection for Generalized Zero-Shot Action RecognitionCode0
Resource Efficient 3D Convolutional Neural NetworksCode0
Robust Real-Time Violence Detection in Video Using CNN And LSTMCode0
Collaborative Spatio-temporal Feature Learning for Video Action RecognitionCode0
Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision0
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition0
Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks0
Coupled Recurrent Network (CRN)0
SlowFast Networks for Video RecognitionCode1
Evolving Space-Time Neural Architectures for Videos0
Representation Flow for Action RecognitionCode0
Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos0
Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos0
Motion Feature Network: Fixed Motion Filter for Action Recognition0
Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks0
Pose-Based Two-Stream Relational Networks for Action Recognition in Videos0
DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding0
Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition0
Video Representation Learning Using Discriminative Pooling0
2D/3D Pose Estimation and Action Recognition using Multitask Deep LearningCode0
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition0
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action RecognitionCode0
Temporal Relational Reasoning in VideosCode0
RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in VideosCode0
Two-stream Flow-guided Convolutional Attention Networks for Action RecognitionCode0
Discriminative convolutional Fisher vector network for action recognition0
Developing the Path Signature Methodology and its Application to Landmark-based Human Action Recognition0
Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CPNet Res34, 5 CPVal96.7Unverified
2STM (Resnet-50, 16 frames)Val96.7Unverified
3MFNetVal96.68Unverified
4DINVal95.31Unverified
5MultiScale TRNVal95.31Unverified
6convSTARVal92.7Unverified
73D-SqueezeNetVal90.77Unverified
83D-ShuffleNetV2 0.25xVal86.91Unverified
93D-MobileNetV2 0.2xVal86.43Unverified
#ModelMetricClaimedVerifiedStatus
1DSCNet (RGB + Pose)X-Sub97.4Unverified
2MMNetX-Sub97.4Unverified
3EPAM-NetX-Sub96.2Unverified
4DVANet (RGB only)X-Sub95.8Unverified
5TSMFX-Sub95.8Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)3-fold Accuracy96.2Unverified
23D-SqueezeNet3-fold Accuracy74.94Unverified
33D-ShuffleNetV2 0.25x3-fold Accuracy56.52Unverified
43D-MobileNetV2 0.2x3-fold Accuracy55.56Unverified
5Baseline UCF1013-fold Accuracy43.9Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top-1 Accuracy64.2Unverified
2CPNet Res34, 5 CPTop-1 Accuracy57.65Unverified
32-Stream TRNTop-1 Accuracy55.52Unverified
4DINTop-1 Accuracy34.11Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy86.5Unverified
2ActionCLIP (ViT-B/16)Top-1 Accuracy83.8Unverified
3Frozen Backbone, SwinV2-G-ext22K (Video-Swin)Top-1 Accuracy81.7Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)20.2Unverified
2VideoMAE V2mAP (Val)18.24Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)49.2Unverified
2OTAM[3]++Top-1 Accuracy(5-Way-1-Shot)42.8Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)39.8Unverified
2CMN[35]Top-1 Accuracy(5-Way-1-Shot)36.2Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendVideo hit@174.8Unverified
2LSTM +Pretrained on YT-8MVideo hit@165.7Unverified
#ModelMetricClaimedVerifiedStatus
1Single-stream R-C3D (two-way buffer)mAP@0.154.5Unverified
2Single-stream R-C3D (one-way buffer)mAP@0.151.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM + Pretrained on YT-8MmAP75.6Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)19.2Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)Average accuracy of 3 splits72.2Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy87.8Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendClip Hit@149.7Unverified
#ModelMetricClaimedVerifiedStatus
12D-3D-Softargmax (RGB only)Accuracy (CS)85.5Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top 1 Accuracy50.7Unverified