SOTAVerified

Action Recognition

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Papers

Showing 13511400 of 2759 papers

TitleStatusHype
CCS: Continuous Learning for Customized Incremental Wireless Sensing Services0
Unifying Graph Embedding Features with Graph Convolutional Networks for Skeleton-based Action Recognition0
Challenge report:VIPriors Action Recognition Challenge0
Challenges of the Creation of a Dataset for Vision Based Human Hand Action Recognition in Industrial Assembly0
CHAM: action recognition using convolutional hierarchical attention model0
Channel-Temporal Attention for First-Person Video Domain Adaptation0
ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild0
Chop & Learn: Recognizing and Generating Object-State Compositions0
Classifying action correctness in physical rehabilitation exercises0
Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints0
Classifying Soccer Ball-on-Goal Position Through Kicker Shooting Action0
Class-Incremental Learning for Action Recognition in Videos0
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition0
Clean Text and Full-Body Transformer: Microsoft's Submission to the WMT22 Shared Task on Sign Language Translation0
CLTA: Contents and Length-based Temporal Attention for Few-shot Action Recognition0
CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition0
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition0
CNN-Based Action Recognition and Pose Estimation for Classifying Animal Behavior from Videos: A Survey0
CNN-based Action Recognition and Supervised Domain Adaptation on 3D Body Skeletons via Kernel Feature Maps0
Coding Kendall's Shape Trajectories for 3D Action Recognition0
Coherent Temporal Synthesis for Incremental Action Segmentation0
Collaborative Attention Mechanism for Multi-View Action Recognition0
Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition0
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics0
Collaboratively Self-supervised Video Representation Learning for Action Recognition0
Collecting and Annotating the Large Continuous Action Dataset0
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval0
Combating Missing Modalities in Egocentric Videos at Test Time0
Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition0
Combining ConvNets with Hand-Crafted Features for Action Recognition Based on an HMM-SVM Classifier0
Combining Deep Learning Classifiers for 3D Action Recognition0
Combining Spatio-Temporal Appearance Descriptors and Optical Flow for Human Action Recognition in Video Data0
CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices0
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions0
Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark0
Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method0
Complex Video Action Reasoning via Learnable Markov Logic Network0
Composable Augmentation Encoding for Video Representation Learning0
Compound Prototype Matching for Few-shot Action Recognition0
Comprehensive Video Understanding: Video summarization with content-based video recommender design0
Compressed Video Action Recognition with Refined Motion Vector0
Computer Vision for Primate Behavior Analysis in the Wild0
Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition0
CoNFies: Controllable Neural Face Avatars0
Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition0
Context Aware Graph Convolution for Skeleton-Based Action Recognition0
Context-based Object Viewpoint Estimation: A 2D Relational Approach0
Context-LSTM: a robust classifier for video detection on UCF1010
Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition0
Continual Learning Improves Zero-Shot Action Recognition0
Show:102550
← PrevPage 28 of 56Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MViTv2-B (IN-21K + Kinetics400 pretrain)Top-5 Accuracy93.4Unverified
2RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)Top-5 Accuracy91.1Unverified
3MVD (Kinetics400 pretrain, ViT-H, 16 frame)Top-1 Accuracy77.3Unverified
4InternVideoTop-1 Accuracy77.2Unverified
5DejaVidTop-1 Accuracy77.2Unverified
6InternVideo2-1BTop-1 Accuracy77.1Unverified
7VideoMAE V2-gTop-1 Accuracy77Unverified
8MVD (Kinetics400 pretrain, ViT-L, 16 frame)Top-1 Accuracy76.7Unverified
9Hiera-L (no extra data)Top-1 Accuracy76.5Unverified
10TubeViT-LTop-1 Accuracy76.1Unverified
#ModelMetricClaimedVerifiedStatus
1FTP-UniFormerV2-L/143-fold Accuracy99.7Unverified
2OmniVec23-fold Accuracy99.6Unverified
3OmniVec3-fold Accuracy99.6Unverified
4VideoMAE V2-g3-fold Accuracy99.6Unverified
5BIKE3-fold Accuracy98.8Unverified
6SMART3-fold Accuracy98.64Unverified
7ZeroI2V ViT-L/143-fold Accuracy98.6Unverified
8OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)3-fold Accuracy98.6Unverified
9PERF-Net (multi-distilled S3D)3-fold Accuracy98.6Unverified
10Text4Vis3-fold Accuracy98.2Unverified