SOTAVerified

Action Classification

Papers

Showing 51100 of 457 papers

TitleStatusHype
Open-Vocabulary Video Relation ExtractionCode1
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision0
ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room0
Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily LivingCode1
CAST: Cross-Attention in Space and Time for Video Action RecognitionCode1
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition0
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization0
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities0
OmniVec: Learning robust representations with cross modal sharing0
Asymmetric Masked Distillation for Pre-Training Small Foundation ModelsCode0
After-Stroke Arm Paresis Detection using Kinematic Data0
Proposal-based Temporal Action Localization with Point-level Supervision0
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to VideoCode1
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild0
MOFO: MOtion FOcused Self-Supervision for Video UnderstandingCode0
Progression-Guided Temporal Action Detection in VideosCode0
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
Temporally-Adaptive Models for Efficient Video UnderstandingCode0
Joint Skeletal and Semantic Embedding Loss for Micro-gesture ClassificationCode0
Actor-agnostic Multi-label Action Recognition with Multi-modal QueryCode1
What Can Simple Arithmetic Operations Do for Temporal Modeling?Code1
Semi Supervised Meta Learning for Spatiotemporal Learning0
Spiking Two-Stream Methods with Unsupervised STDP-based Learning for Action Recognition0
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision TransformersCode1
How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks0
Human Action Recognition in Egocentric Perspective Using 2D Object and Hands Pose0
HomE: Homography-Equivariant Video Representation LearningCode0
Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesCode0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Self-Supervised Video Representation Learning via Latent Time Navigation0
AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose EstimationCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
VicTR: Video-conditioned Text Representations for Activity Recognition0
VideoMAE V2: Scaling Video Masked Autoencoders with Dual MaskingCode2
Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsCode0
The effectiveness of MAE pre-pretraining for billion-scale pretrainingCode1
Multi-modal Prompting for Low-Shot Temporal Action Localization0
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked AutoencodersCode0
Dual-path Adaptation from Image to Video TransformersCode1
Classification of Primitive Manufacturing Tasks from Filtered Event Data0
Scaling Vision Transformers to 22 Billion ParametersCode0
Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional NetworksCode0
AIM: Adapting Image Models for Efficient Video Action RecognitionCode2
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention MechanismsCode0
Deep Dependency Networks for Multi-Label Classification0
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and VideoCode4
Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework0
HierVL: Learning Hierarchical Video-Language EmbeddingsCode1
ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded0
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.