SOTAVerified

Action Recognition In Videos

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Papers

Showing 5175 of 124 papers

TitleStatusHype
Learning Video Representations from Correspondence ProposalsCode0
Learn to cycle: Time-consistent feature discovery for action recognitionCode0
MMTM: Multimodal Transfer Module for CNN FusionCode0
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action RecognitionCode0
Out-of-Distribution Detection for Generalized Zero-Shot Action RecognitionCode0
Pose And Joint-Aware Action RecognitionCode0
R-C3D: Region Convolutional 3D Network for Temporal Activity DetectionCode0
Representation Flow for Action RecognitionCode0
Resource Efficient 3D Convolutional Neural NetworksCode0
Robust Real-Time Violence Detection in Video Using CNN And LSTMCode0
RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in VideosCode0
Self-Supervised MultiModal Versatile NetworksCode0
Temporal Relational Reasoning in VideosCode0
Towards Improving Spatiotemporal Action Recognition in VideosCode0
Two-Stream Convolutional Networks for Action Recognition in VideosCode0
Two-stream Flow-guided Convolutional Attention Networks for Action RecognitionCode0
UCF101: A Dataset of 101 Human Actions Classes From Videos in The WildCode0
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet TransformerCode0
Video Transformer NetworkCode0
What Makes Training Multi-Modal Classification Networks Hard?Code0
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action LocalizationCode0
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition0
Action Class Relation Detection and Classification Across Multiple Video Datasets0
A Multi-Stream Bi-Directional Recurrent Neural Network for Fine-Grained Action Detection0
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web0
Show:102550
← PrevPage 3 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CPNet Res34, 5 CPVal96.7Unverified
2STM (Resnet-50, 16 frames)Val96.7Unverified
3MFNetVal96.68Unverified
4DINVal95.31Unverified
5MultiScale TRNVal95.31Unverified
6convSTARVal92.7Unverified
73D-SqueezeNetVal90.77Unverified
83D-ShuffleNetV2 0.25xVal86.91Unverified
93D-MobileNetV2 0.2xVal86.43Unverified
#ModelMetricClaimedVerifiedStatus
1DSCNet (RGB + Pose)X-Sub97.4Unverified
2MMNetX-Sub97.4Unverified
3EPAM-NetX-Sub96.2Unverified
4DVANet (RGB only)X-Sub95.8Unverified
5TSMFX-Sub95.8Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)3-fold Accuracy96.2Unverified
23D-SqueezeNet3-fold Accuracy74.94Unverified
33D-ShuffleNetV2 0.25x3-fold Accuracy56.52Unverified
43D-MobileNetV2 0.2x3-fold Accuracy55.56Unverified
5Baseline UCF1013-fold Accuracy43.9Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top-1 Accuracy64.2Unverified
2CPNet Res34, 5 CPTop-1 Accuracy57.65Unverified
32-Stream TRNTop-1 Accuracy55.52Unverified
4DINTop-1 Accuracy34.11Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy86.5Unverified
2ActionCLIP (ViT-B/16)Top-1 Accuracy83.8Unverified
3Frozen Backbone, SwinV2-G-ext22K (Video-Swin)Top-1 Accuracy81.7Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)20.2Unverified
2VideoMAE V2mAP (Val)18.24Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)49.2Unverified
2OTAM[3]++Top-1 Accuracy(5-Way-1-Shot)42.8Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)39.8Unverified
2CMN[35]Top-1 Accuracy(5-Way-1-Shot)36.2Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendVideo hit@174.8Unverified
2LSTM +Pretrained on YT-8MVideo hit@165.7Unverified
#ModelMetricClaimedVerifiedStatus
1Single-stream R-C3D (two-way buffer)mAP@0.154.5Unverified
2Single-stream R-C3D (one-way buffer)mAP@0.151.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM + Pretrained on YT-8MmAP75.6Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)19.2Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)Average accuracy of 3 splits72.2Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy87.8Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendClip Hit@149.7Unverified
#ModelMetricClaimedVerifiedStatus
12D-3D-Softargmax (RGB only)Accuracy (CS)85.5Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top 1 Accuracy50.7Unverified