SOTAVerified

Action Classification

Papers

Showing 51100 of 457 papers

TitleStatusHype
Multiscale Vision TransformersCode1
A Closer Look at Spatiotemporal Convolutions for Action RecognitionCode1
MotionSqueeze: Neural Motion Feature Learning for Video UnderstandingCode1
Representation Learning via Global Temporal Alignment and Cycle-ConsistencyCode1
Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily LivingCode1
Keeping Your Eye on the Ball: Trajectory Attention in Video TransformersCode1
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked AutoencodersCode1
Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity RecognitionCode1
AViD Dataset: Anonymized Videos from Diverse CountriesCode1
BABEL: Bodies, Action and Behavior with English LabelsCode1
Weakly-supervised Temporal Action Localization by Uncertainty ModelingCode1
Self-supervised Video TransformerCode1
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D VideosCode1
MoViNets: Mobile Video Networks for Efficient Video RecognitionCode1
Mutual Modality Learning for Video Action ClassificationCode1
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action LocalizationCode1
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action RecognitionCode1
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action UnderstandingCode1
Memory-augmented Dense Predictive Coding for Video Representation LearningCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
CAST: Cross-Attention in Space and Time for Video Action RecognitionCode1
Masked Feature Prediction for Self-Supervised Visual Pre-TrainingCode1
Can Deep Learning Recognize Subtle Human Activities?Code1
MAR: Masked Autoencoders for Efficient Action RecognitionCode1
Class-Difficulty Based Methods for Long-Tailed Visual RecognitionCode1
Alleviating Over-segmentation Errors by Detecting Action BoundariesCode1
ActionCLIP: A New Paradigm for Video Action RecognitionCode1
Enriching Local and Global Contexts for Temporal Action LocalizationCode1
An Empirical Study of End-to-End Temporal Action DetectionCode1
Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless SensingCode1
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
An Evaluation of Action Recognition Models on EPIC-KitchensCode1
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation LearningCode1
Frame-wise Action Representations for Long Videos via Sequence Contrastive LearningCode1
Florence: A New Foundation Model for Computer VisionCode1
Continual 3D Convolutional Neural Networks for Real-time Processing of VideosCode1
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionCode1
ConvNet Architecture Search for Spatiotemporal Feature LearningCode1
Dual-path Adaptation from Image to Video TransformersCode1
Non-local Neural NetworksCode1
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cuesCode1
CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese NetworkCode1
CT-Net: Channel Tensorization Network for Video ClassificationCode1
Open-Vocabulary Video Relation ExtractionCode1
Let's Play for Action: Recognizing Activities of Daily Living by Learning from Life Simulation Video GamesCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
High Quality Monocular Depth Estimation via Transfer LearningCode1
Large Scale Holistic Video UnderstandingCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
Learning To Recognize Procedural Activities with Distant SupervisionCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.