SOTAVerified

Action Classification

Papers

Showing 150 of 457 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
VideoMamba: State Space Model for Efficient Video UnderstandingCode5
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and VideoCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
Towards Universal Soccer Video UnderstandingCode3
Is Space-Time Attention All You Need for Video Understanding?Code2
Omni-sourced Webly-supervised Learning for Video RecognitionCode2
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionCode2
Video Swin TransformerCode2
Temporal Segment Networks for Action Recognition in VideosCode2
Temporal Segment Networks: Towards Good Practices for Deep Action RecognitionCode2
Omnivore: A Single Model for Many Visual ModalitiesCode2
Learning Video Representations from Large Language ModelsCode2
AIM: Adapting Image Models for Efficient Video Action RecognitionCode2
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormerCode2
X3D: Expanding Architectures for Efficient Video RecognitionCode2
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsCode2
MARLIN: Masked Autoencoder for facial video Representation LearnINgCode2
VideoMAE V2: Scaling Video Masked Autoencoders with Dual MaskingCode2
Infrared and 3D skeleton feature fusion for RGB-D action recognitionCode1
Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily LivingCode1
High Quality Monocular Depth Estimation via Transfer LearningCode1
A Closer Look at Spatiotemporal Convolutions for Action RecognitionCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked AutoencodersCode1
Large Scale Holistic Video UnderstandingCode1
Frozen CLIP Models are Efficient Video LearnersCode1
Weakly-supervised Temporal Action Localization by Uncertainty ModelingCode1
HierVL: Learning Hierarchical Video-Language EmbeddingsCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
Keeping Your Eye on the Ball: Trajectory Attention in Video TransformersCode1
Enriching Local and Global Contexts for Temporal Action LocalizationCode1
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action UnderstandingCode1
Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless SensingCode1
An Image is Worth 16x16 Words, What is a Video Worth?Code1
CT-Net: Channel Tensorization Network for Video ClassificationCode1
An Evaluation of Action Recognition Models on EPIC-KitchensCode1
An Empirical Study of End-to-End Temporal Action DetectionCode1
CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese NetworkCode1
Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video ProcessingCode1
Dual-path Adaptation from Image to Video TransformersCode1
Co-segmentation Inspired Attention Module for Video-based Computer Vision TasksCode1
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action RecognitionCode1
AViD Dataset: Anonymized Videos from Diverse CountriesCode1
BABEL: Bodies, Action and Behavior with English LabelsCode1
Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity RecognitionCode1
DirecFormer: A Directed Attention in Transformer Approach to Robust Action RecognitionCode1
Show:102550
← PrevPage 1 of 10Next →

No leaderboard results yet.