SOTAVerified

Action Classification

Papers

Showing 51100 of 457 papers

TitleStatusHype
ViA: View-invariant Skeleton Action Representation Learning via Motion RetargetingCode1
Frozen CLIP Models are Efficient Video LearnersCode1
Class-Difficulty Based Methods for Long-Tailed Visual RecognitionCode1
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action RecognitionCode1
MAR: Masked Autoencoders for Efficient Action RecognitionCode1
ReAct: Temporal Action Detection with Relational QueriesCode1
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningCode1
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action VideosCode1
Stand-Alone Inter-Frame Attention in Video ModelsCode1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D VideosCode1
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
An Empirical Study of End-to-End Temporal Action DetectionCode1
SPAct: Self-supervised Privacy Preservation for Action RecognitionCode1
Frame-wise Action Representations for Long Videos via Sequence Contrastive LearningCode1
DirecFormer: A Directed Attention in Transformer Approach to Robust Action RecognitionCode1
OpenTAL: Towards Open Set Temporal Action LocalizationCode1
Delving Deep into One-Shot Skeleton-based Action Recognition with Diverse OcclusionsCode1
Learning To Recognize Procedural Activities with Distant SupervisionCode1
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionCode1
Masked Feature Prediction for Self-Supervised Visual Pre-TrainingCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
Self-supervised Video TransformerCode1
Florence: A New Foundation Model for Computer VisionCode1
Swin Transformer V2: Scaling Up Capacity and ResolutionCode1
Co-segmentation Inspired Attention Module for Video-based Computer Vision TasksCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation LearningCode1
Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence ClassificationCode1
Three-Stream 3D/1D CNN for Fine-Grained Action Classification and Segmentation in Table TennisCode1
ActionCLIP: A New Paradigm for Video Action RecognitionCode1
roadscene2vec: A Tool for Extracting and Embedding Road Scene-GraphsCode1
Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action RecognitionCode1
Video Contrastive Learning with Global ContextCode1
Enriching Local and Global Contexts for Temporal Action LocalizationCode1
UNIK: A Unified Framework for Real-world Skeleton-based Action RecognitionCode1
Let's Play for Action: Recognizing Activities of Daily Living by Learning from Life Simulation Video GamesCode1
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive LearningCode1
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?Code1
Proposal Relation Network for Temporal Action DetectionCode1
BABEL: Bodies, Action and Behavior with English LabelsCode1
Space-time Mixing Attention for Video TransformerCode1
Keeping Your Eye on the Ball: Trajectory Attention in Video TransformersCode1
CT-Net: Channel Tensorization Network for Video ClassificationCode1
Continual 3D Convolutional Neural Networks for Real-time Processing of VideosCode1
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily LivingCode1
Representation Learning via Global Temporal Alignment and Cycle-ConsistencyCode1
Unsupervised Visual Representation Learning by Tracking Patches in VideoCode1
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextCode1
Multiscale Vision TransformersCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.