SOTAVerified

Action Recognition

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Papers

Showing 25262550 of 2759 papers

TitleStatusHype
Emotion-Based Crowd Representation for Abnormality Detection0
Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition0
Kinematic-Layout-aware Random Forests for Depth-based Action Recognition0
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking0
Hierarchical Attention Network for Action Recognition in Videos0
Annotation Methodologies for Vision and Language Dataset Creation0
Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection0
Action Recognition with Joint Attention on Multi-Level Deep Features0
Zero-Shot Visual Recognition via Bidirectional Latent Embedding0
VideoLSTM Convolves, Attends and Flows for Action RecognitionCode0
Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data0
A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets0
Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform0
Force From Motion: Decoding Physical Sensation in a First Person Video0
A Multi-Stream Bi-Directional Recurrent Neural Network for Fine-Grained Action Detection0
Discriminative Hierarchical Rank Pooling for Activity Recognition0
Pairwise Linear Regression Classification for Image Set Retrieval0
A Key Volume Mining Deep Framework for Action Recognition0
Thin-Slicing for Pose: Learning to Understand Pose Without Explicit Pose Estimation0
You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images0
Dynamic Image Networks for Action RecognitionCode0
3D Action Recognition From Novel Viewpoints0
Cascaded Interactional Targeting Network for Egocentric Video Analysis0
Efficient Temporal Sequence Comparison and Classification Using Gram Matrix Embeddings on a Riemannian Manifold0
First Person Action Recognition Using Deep Learned Descriptors0
Show:102550
← PrevPage 102 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MViTv2-B (IN-21K + Kinetics400 pretrain)Top-5 Accuracy93.4Unverified
2RSANet-R50 (8+16 frames, ImageNet pretrained, 2 clips)Top-5 Accuracy91.1Unverified
3MVD (Kinetics400 pretrain, ViT-H, 16 frame)Top-1 Accuracy77.3Unverified
4InternVideoTop-1 Accuracy77.2Unverified
5DejaVidTop-1 Accuracy77.2Unverified
6InternVideo2-1BTop-1 Accuracy77.1Unverified
7VideoMAE V2-gTop-1 Accuracy77Unverified
8MVD (Kinetics400 pretrain, ViT-L, 16 frame)Top-1 Accuracy76.7Unverified
9Hiera-L (no extra data)Top-1 Accuracy76.5Unverified
10TubeViT-LTop-1 Accuracy76.1Unverified
#ModelMetricClaimedVerifiedStatus
1FTP-UniFormerV2-L/143-fold Accuracy99.7Unverified
2OmniVec23-fold Accuracy99.6Unverified
3OmniVec3-fold Accuracy99.6Unverified
4VideoMAE V2-g3-fold Accuracy99.6Unverified
5BIKE3-fold Accuracy98.8Unverified
6SMART3-fold Accuracy98.64Unverified
7ZeroI2V ViT-L/143-fold Accuracy98.6Unverified
8OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)3-fold Accuracy98.6Unverified
9PERF-Net (multi-distilled S3D)3-fold Accuracy98.6Unverified
10Text4Vis3-fold Accuracy98.2Unverified