SOTAVerified

Action Classification

Papers

Showing 150 of 457 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
VideoMamba: State Space Model for Efficient Video UnderstandingCode5
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and VideoCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
Towards Universal Soccer Video UnderstandingCode3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingCode3
Omni-sourced Webly-supervised Learning for Video RecognitionCode2
Is Space-Time Attention All You Need for Video Understanding?Code2
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionCode2
Video Swin TransformerCode2
X3D: Expanding Architectures for Efficient Video RecognitionCode2
Temporal Segment Networks: Towards Good Practices for Deep Action RecognitionCode2
Omnivore: A Single Model for Many Visual ModalitiesCode2
Learning Video Representations from Large Language ModelsCode2
AIM: Adapting Image Models for Efficient Video Action RecognitionCode2
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormerCode2
MARLIN: Masked Autoencoder for facial video Representation LearnINgCode2
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsCode2
VideoMAE V2: Scaling Video Masked Autoencoders with Dual MaskingCode2
Temporal Segment Networks for Action Recognition in VideosCode2
Co-segmentation Inspired Attention Module for Video-based Computer Vision TasksCode1
CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese NetworkCode1
Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily LivingCode1
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
A Closer Look at Spatiotemporal Convolutions for Action RecognitionCode1
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked AutoencodersCode1
Continual 3D Convolutional Neural Networks for Real-time Processing of VideosCode1
ConvNet Architecture Search for Spatiotemporal Feature LearningCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
CT-Net: Channel Tensorization Network for Video ClassificationCode1
Large Scale Holistic Video UnderstandingCode1
Infrared and 3D skeleton feature fusion for RGB-D action recognitionCode1
Keeping Your Eye on the Ball: Trajectory Attention in Video TransformersCode1
Can Deep Learning Recognize Subtle Human Activities?Code1
Frame-wise Action Representations for Long Videos via Sequence Contrastive LearningCode1
Frozen CLIP Models are Efficient Video LearnersCode1
BABEL: Bodies, Action and Behavior with English LabelsCode1
An Image is Worth 16x16 Words, What is a Video Worth?Code1
Weakly-supervised Temporal Action Localization by Uncertainty ModelingCode1
An Evaluation of Action Recognition Models on EPIC-KitchensCode1
An Empirical Study of End-to-End Temporal Action DetectionCode1
Florence: A New Foundation Model for Computer VisionCode1
Boundary-sensitive Pre-training for Temporal Localization in VideosCode1
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cuesCode1
CAST: Cross-Attention in Space and Time for Video Action RecognitionCode1
Class-Difficulty Based Methods for Long-Tailed Visual RecognitionCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless SensingCode1
Show:102550
← PrevPage 1 of 10Next →

No leaderboard results yet.