SOTAVerified

Action Recognition In Videos

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Papers

Showing 51100 of 124 papers

TitleStatusHype
Coupled Recurrent Network (CRN)0
Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition0
Deep Learning Approaches for Human Action Recognition in Video Data0
DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding0
Developing Motion Code Embedding for Action Recognition in Videos0
Discriminative convolutional Fisher vector network for action recognition0
Discriminative Video Representation Learning Using Support Vector Classifiers0
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition0
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web0
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition0
Dynamic Sampling Networks for Efficient Action Recognition in Videos0
Evolving Space-Time Neural Architectures for Videos0
Hierarchical Attention Network for Action Recognition in Videos0
Knowledge Prompting for Few-shot Action Recognition0
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition0
Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks0
Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision0
Developing the Path Signature Methodology and its Application to Landmark-based Human Action Recognition0
Motion Feature Network: Fixed Motion Filter for Action Recognition0
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition0
Per-Sample Kernel Adaptation for Visual Recognition and Grouping0
Pose-Based Two-Stream Relational Networks for Action Recognition in Videos0
Pose from Action: Unsupervised Learning of Pose Features based on Motion0
Procedural Generation of Videos to Train Deep Action Recognition Networks0
Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks0
Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues0
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View0
Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos0
STM: SpatioTemporal and Motion Encoding for Action Recognition0
Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition0
Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing0
Temporal Difference Networks for Action Recognition0
Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos0
The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks0
Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos0
Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes0
Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition0
Video Representation Learning Using Discriminative Pooling0
Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition0
Where and when to look? Spatial-temporal attention for action recognition in videos0
Collaborative Spatiotemporal Feature Learning for Video Action RecognitionCode0
Collaborative Spatio-temporal Feature Learning for Video Action RecognitionCode0
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet TransformerCode0
Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesCode0
HaltingVT: Adaptive Token Halting Transformer for Efficient Video RecognitionCode0
Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action RecognitionCode0
ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in VideosCode0
Temporal Relational Reasoning in VideosCode0
Gating Revisited: Deep Multi-layer RNNs That Can Be TrainedCode0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CPNet Res34, 5 CPVal96.7Unverified
2STM (Resnet-50, 16 frames)Val96.7Unverified
3MFNetVal96.68Unverified
4DINVal95.31Unverified
5MultiScale TRNVal95.31Unverified
6convSTARVal92.7Unverified
73D-SqueezeNetVal90.77Unverified
83D-ShuffleNetV2 0.25xVal86.91Unverified
93D-MobileNetV2 0.2xVal86.43Unverified
#ModelMetricClaimedVerifiedStatus
1DSCNet (RGB + Pose)X-Sub97.4Unverified
2MMNetX-Sub97.4Unverified
3EPAM-NetX-Sub96.2Unverified
4DVANet (RGB only)X-Sub95.8Unverified
5TSMFX-Sub95.8Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)3-fold Accuracy96.2Unverified
23D-SqueezeNet3-fold Accuracy74.94Unverified
33D-ShuffleNetV2 0.25x3-fold Accuracy56.52Unverified
43D-MobileNetV2 0.2x3-fold Accuracy55.56Unverified
5Baseline UCF1013-fold Accuracy43.9Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top-1 Accuracy64.2Unverified
2CPNet Res34, 5 CPTop-1 Accuracy57.65Unverified
32-Stream TRNTop-1 Accuracy55.52Unverified
4DINTop-1 Accuracy34.11Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy86.5Unverified
2ActionCLIP (ViT-B/16)Top-1 Accuracy83.8Unverified
3Frozen Backbone, SwinV2-G-ext22K (Video-Swin)Top-1 Accuracy81.7Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)20.2Unverified
2VideoMAE V2mAP (Val)18.24Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)49.2Unverified
2OTAM[3]++Top-1 Accuracy(5-Way-1-Shot)42.8Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)39.8Unverified
2CMN[35]Top-1 Accuracy(5-Way-1-Shot)36.2Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendVideo hit@174.8Unverified
2LSTM +Pretrained on YT-8MVideo hit@165.7Unverified
#ModelMetricClaimedVerifiedStatus
1Single-stream R-C3D (two-way buffer)mAP@0.154.5Unverified
2Single-stream R-C3D (one-way buffer)mAP@0.151.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM + Pretrained on YT-8MmAP75.6Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)19.2Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)Average accuracy of 3 splits72.2Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy87.8Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendClip Hit@149.7Unverified
#ModelMetricClaimedVerifiedStatus
12D-3D-Softargmax (RGB only)Accuracy (CS)85.5Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top 1 Accuracy50.7Unverified