SOTAVerified

Action Recognition

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Papers

Showing 13761400 of 2759 papers

TitleStatusHype
Collecting and Annotating the Large Continuous Action Dataset0
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval0
Combating Missing Modalities in Egocentric Videos at Test Time0
Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition0
Combining ConvNets with Hand-Crafted Features for Action Recognition Based on an HMM-SVM Classifier0
Combining Deep Learning Classifiers for 3D Action Recognition0
Combining Spatio-Temporal Appearance Descriptors and Optical Flow for Human Action Recognition in Video Data0
CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices0
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions0
Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark0
Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method0
Complex Video Action Reasoning via Learnable Markov Logic Network0
Composable Augmentation Encoding for Video Representation Learning0
Compound Prototype Matching for Few-shot Action Recognition0
Comprehensive Video Understanding: Video summarization with content-based video recommender design0
Compressed Video Action Recognition with Refined Motion Vector0
Computer Vision for Primate Behavior Analysis in the Wild0
Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition0
CoNFies: Controllable Neural Face Avatars0
Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition0
Context Aware Graph Convolution for Skeleton-Based Action Recognition0
Context-based Object Viewpoint Estimation: A 2D Relational Approach0
Context-LSTM: a robust classifier for video detection on UCF1010
Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition0
Continual Learning Improves Zero-Shot Action Recognition0
Show:102550
← PrevPage 56 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MViTv2-B (IN-21K + Kinetics400 pretrain)Top-5 Accuracy93.4Unverified
2RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)Top-5 Accuracy91.1Unverified
3MVD (Kinetics400 pretrain, ViT-H, 16 frame)Top-1 Accuracy77.3Unverified
4DejaVidTop-1 Accuracy77.2Unverified
5InternVideoTop-1 Accuracy77.2Unverified
6InternVideo2-1BTop-1 Accuracy77.1Unverified
7VideoMAE V2-gTop-1 Accuracy77Unverified
8MVD (Kinetics400 pretrain, ViT-L, 16 frame)Top-1 Accuracy76.7Unverified
9Hiera-L (no extra data)Top-1 Accuracy76.5Unverified
10TubeViT-LTop-1 Accuracy76.1Unverified
#ModelMetricClaimedVerifiedStatus
1FTP-UniFormerV2-L/143-fold Accuracy99.7Unverified
2OmniVec23-fold Accuracy99.6Unverified
3VideoMAE V2-g3-fold Accuracy99.6Unverified
4OmniVec3-fold Accuracy99.6Unverified
5BIKE3-fold Accuracy98.8Unverified
6SMART3-fold Accuracy98.64Unverified
7OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)3-fold Accuracy98.6Unverified
8PERF-Net (multi-distilled S3D)3-fold Accuracy98.6Unverified
9ZeroI2V ViT-L/143-fold Accuracy98.6Unverified
10LGD-3D Two-stream3-fold Accuracy98.2Unverified