SOTAVerified

Action Classification

Papers

Showing 101150 of 457 papers

TitleStatusHype
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation LearningCode1
roadscene2vec: A Tool for Extracting and Embedding Road Scene-GraphsCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
DirecFormer: A Directed Attention in Transformer Approach to Robust Action RecognitionCode1
A Closer Look at Spatiotemporal Convolutions for Action RecognitionCode1
Actor-agnostic Multi-label Action Recognition with Multi-modal QueryCode1
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision TransformersCode1
Revisiting ResNets: Improved Training and Scaling StrategiesCode1
Infrared and 3D skeleton feature fusion for RGB-D action recognitionCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
Self-supervised Video TransformerCode1
Mutual Modality Learning for Video Action ClassificationCode1
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningCode1
Visual Semantic Role LabelingCode1
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked AutoencodersCode1
What and How Well You Performed? A Multitask Learning Approach to Action Quality AssessmentCode1
Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity RecognitionCode1
Dual-path Adaptation from Image to Video TransformersCode1
ReAct: Temporal Action Detection with Relational QueriesCode1
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual RecognitionCode1
BABEL: Bodies, Action and Behavior with English LabelsCode1
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation LearningCode1
Region-based Non-local Operation for Video ClassificationCode1
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action LocalizationCode1
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action UnderstandingCode1
Enriching Local and Global Contexts for Temporal Action LocalizationCode1
Delving Deep into One-Shot Skeleton-based Action Recognition with Diverse OcclusionsCode1
ViViT: A Video Vision TransformerCode1
Post-Processing Temporal Action DetectionCode1
Proposal Relation Network for Temporal Action DetectionCode1
Representation Learning via Global Temporal Alignment and Cycle-ConsistencyCode1
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action RecognitionCode1
Quo Vadis, Action Recognition? A New Model and the Kinetics DatasetCode1
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
Boundary-sensitive Pre-training for Temporal Localization in VideosCode1
High Quality Monocular Depth Estimation via Transfer LearningCode1
HierVL: Learning Hierarchical Video-Language EmbeddingsCode1
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video LearningCode1
Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless SensingCode1
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cuesCode1
Large Scale Holistic Video UnderstandingCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
Stand-Alone Inter-Frame Attention in Video ModelsCode1
Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation LearningCode0
Do we really need temporal convolutions in action segmentation?Code0
Object Priors for Classifying and Localizing Unseen ActionsCode0
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under OcclusionsCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
NTU RGB+D: A Large Scale Dataset for 3D Human Activity AnalysisCode0
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave ConvolutionCode0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.