SOTAVerified

Action Recognition

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Papers

Showing 16511700 of 2759 papers

TitleStatusHype
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection0
Relational Action Forecasting0
Relational Long Short-Term Memory for Video Action Recognition0
Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement0
Representation Learning via Adversarially-Contrastive Optimal Transport0
Representation Learning with Video Deep InfoMax0
Representing Videos as Discriminative Sub-graphs for Action Recognition0
Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition0
RESOUND: Towards Action Recognition without Representation Bias0
REST: REtrieve & Self-Train for generative action recognition0
Rethinking Full Connectivity in Recurrent Neural Networks0
Rethinking Image-to-Video Adaptation: An Object-centric Perspective0
Rethinking matching-based few-shot action recognition0
Rethinking Top Probability from Multi-view for Distracted Driver Behaviour Localization0
Retrieving and Highlighting Action with Spatiotemporal Reference0
Multi-Task Learning of Generalizable Representations for Video Action Recognition0
Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction0
Review on Action Recognition for Accident Detection in Smart City Transportation Systems0
Revisiting Human Action Recognition: Personalization vs. Generalization0
Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition0
RGB-D-based Action Recognition Datasets: A Survey0
RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks0
RGB Video Based Tennis Action Recognition Using a Deep Historical Long Short-Term Memory0
Riemannian batch normalization for SPD neural networks0
RNN Fisher Vectors for Action Recognition and Image Annotation0
RNN for Affects at SemEval-2018 Task 1: Formulating Affect Identification as a Binary Classification Problem0
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model0
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions0
Robust Audio-Visual Instance Discrimination0
Robust Estimation of 3D Human Poses from a Single Image0
Robust features for facial action recognition0
Robust Multi-body Feature Tracker: A Segmentation-free Approach0
Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments0
Robust Statistical Approach for Extraction of Moving Human Silhouettes from Videos0
Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data0
RotaTR: Detection Transformer for Dense and Rotated Object0
RSA: Randomized Simulation as Augmentation for Robust Human Action Recognition0
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition0
S3Aug: Segmentation, Sampling, and Shift for Action Recognition0
S3TC: Spiking Separated Spatial and Temporal Convolutions with Unsupervised STDP-based Learning for Action Recognition0
SAFCAR: Structured Attention Fusion for Compositional Action Recognition0
SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 20210
Sampling Strategies for Real-Time Action Recognition0
SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching0
What and Where: Modeling Skeletons from Semantic and Spatial Perspectives for Action Recognition0
Scalable and Compact 3D Action Recognition with Approximated RBF Kernel Machines0
Scale Coding Bag of Deep Features for Human Attribute and Action Recognition0
SCA Net: Sparse Channel Attention Module for Action Recognition0
SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition0
Scene Flow to Action Map: A New Representation for RGB-D based Action Recognition with Convolutional Neural Networks0
Show:102550
← PrevPage 34 of 56Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MViTv2-B (IN-21K + Kinetics400 pretrain)Top-5 Accuracy93.4Unverified
2RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)Top-5 Accuracy91.1Unverified
3MVD (Kinetics400 pretrain, ViT-H, 16 frame)Top-1 Accuracy77.3Unverified
4DejaVidTop-1 Accuracy77.2Unverified
5InternVideoTop-1 Accuracy77.2Unverified
6InternVideo2-1BTop-1 Accuracy77.1Unverified
7VideoMAE V2-gTop-1 Accuracy77Unverified
8MVD (Kinetics400 pretrain, ViT-L, 16 frame)Top-1 Accuracy76.7Unverified
9Hiera-L (no extra data)Top-1 Accuracy76.5Unverified
10TubeViT-LTop-1 Accuracy76.1Unverified
#ModelMetricClaimedVerifiedStatus
1FTP-UniFormerV2-L/143-fold Accuracy99.7Unverified
2OmniVec23-fold Accuracy99.6Unverified
3VideoMAE V2-g3-fold Accuracy99.6Unverified
4OmniVec3-fold Accuracy99.6Unverified
5BIKE3-fold Accuracy98.8Unverified
6SMART3-fold Accuracy98.64Unverified
7OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)3-fold Accuracy98.6Unverified
8PERF-Net (multi-distilled S3D)3-fold Accuracy98.6Unverified
9ZeroI2V ViT-L/143-fold Accuracy98.6Unverified
10LGD-3D Two-stream3-fold Accuracy98.2Unverified