SOTAVerified

Action Classification

Papers

Showing 125 of 457 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
VideoMamba: State Space Model for Efficient Video UnderstandingCode5
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and VideoCode4
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingCode3
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
Towards Universal Soccer Video UnderstandingCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Video Swin TransformerCode2
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormerCode2
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionCode2
VideoMAE V2: Scaling Video Masked Autoencoders with Dual MaskingCode2
Temporal Segment Networks for Action Recognition in VideosCode2
X3D: Expanding Architectures for Efficient Video RecognitionCode2
Omni-sourced Webly-supervised Learning for Video RecognitionCode2
Temporal Segment Networks: Towards Good Practices for Deep Action RecognitionCode2
Is Space-Time Attention All You Need for Video Understanding?Code2
Learning Video Representations from Large Language ModelsCode2
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsCode2
AIM: Adapting Image Models for Efficient Video Action RecognitionCode2
Omnivore: A Single Model for Many Visual ModalitiesCode2
MARLIN: Masked Autoencoder for facial video Representation LearnINgCode2
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
Class-Difficulty Based Methods for Long-Tailed Visual RecognitionCode1
Continual 3D Convolutional Neural Networks for Real-time Processing of VideosCode1
Show:102550
← PrevPage 1 of 19Next →

No leaderboard results yet.