SOTAVerified

Action Recognition In Videos

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Papers

Showing 5160 of 124 papers

TitleStatusHype
Co-training Transformer with Videos and Images Improves Action Recognition0
Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing0
Class incremental learning for video action classification0
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition0
Video Transformer NetworkCode0
Temporal Difference Networks for Action Recognition0
Towards Improving Spatiotemporal Action Recognition in VideosCode0
Developing Motion Code Embedding for Action Recognition in Videos0
Pose And Joint-Aware Action RecognitionCode0
Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes0
Show:102550
← PrevPage 6 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CPNet Res34, 5 CPVal96.7Unverified
2STM (Resnet-50, 16 frames)Val96.7Unverified
3MFNetVal96.68Unverified
4DINVal95.31Unverified
5MultiScale TRNVal95.31Unverified
6convSTARVal92.7Unverified
73D-SqueezeNetVal90.77Unverified
83D-ShuffleNetV2 0.25xVal86.91Unverified
93D-MobileNetV2 0.2xVal86.43Unverified
#ModelMetricClaimedVerifiedStatus
1MMNetX-Sub97.4Unverified
2DSCNet (RGB + Pose)X-Sub97.4Unverified
3EPAM-NetX-Sub96.2Unverified
4DVANet (RGB only)X-Sub95.8Unverified
5TSMFX-Sub95.8Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)3-fold Accuracy96.2Unverified
23D-SqueezeNet3-fold Accuracy74.94Unverified
33D-ShuffleNetV2 0.25x3-fold Accuracy56.52Unverified
43D-MobileNetV2 0.2x3-fold Accuracy55.56Unverified
5Baseline UCF1013-fold Accuracy43.9Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top-1 Accuracy64.2Unverified
2CPNet Res34, 5 CPTop-1 Accuracy57.65Unverified
32-Stream TRNTop-1 Accuracy55.52Unverified
4DINTop-1 Accuracy34.11Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy86.5Unverified
2ActionCLIP (ViT-B/16)Top-1 Accuracy83.8Unverified
3Frozen Backbone, SwinV2-G-ext22K (Video-Swin)Top-1 Accuracy81.7Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)20.2Unverified
2VideoMAE V2mAP (Val)18.24Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)49.2Unverified
2OTAM[3]++Top-1 Accuracy(5-Way-1-Shot)42.8Unverified
#ModelMetricClaimedVerifiedStatus
1ITANetTop-1 Accuracy(5-Way-1-Shot)39.8Unverified
2CMN[35]Top-1 Accuracy(5-Way-1-Shot)36.2Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendVideo hit@174.8Unverified
2LSTM +Pretrained on YT-8MVideo hit@165.7Unverified
#ModelMetricClaimedVerifiedStatus
1Single-stream R-C3D (two-way buffer)mAP@0.154.5Unverified
2Single-stream R-C3D (one-way buffer)mAP@0.151.6Unverified
#ModelMetricClaimedVerifiedStatus
1LSTM + Pretrained on YT-8MmAP75.6Unverified
#ModelMetricClaimedVerifiedStatus
1YOWO+LFB*mAP (Val)19.2Unverified
#ModelMetricClaimedVerifiedStatus
1STM (ImageNet+Kinetics pretrain)Average accuracy of 3 splits72.2Unverified
#ModelMetricClaimedVerifiedStatus
1FlorenceTop-1 Accuracy87.8Unverified
#ModelMetricClaimedVerifiedStatus
1G-BlendClip Hit@149.7Unverified
#ModelMetricClaimedVerifiedStatus
12D-3D-Softargmax (RGB only)Accuracy (CS)85.5Unverified
#ModelMetricClaimedVerifiedStatus
1STM (16 frames, ImageNet pretraining)Top 1 Accuracy50.7Unverified