InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Mar 22, 2024 Action Classification Action Recognition
Code Code Available 75 TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis Oct 5, 2022 Action Recognition Anomaly Detection
Code Code Available 65 InternVideo: General Video Foundation Models via Generative and Discriminative Learning Dec 6, 2022 Action Classification Action Recognition
Code Code Available 45 SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models Dec 10, 2024 Action Recognition Spatial Reasoning
Code Code Available 45 DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework Mar 19, 2025 8k Action Recognition
Code Code Available 45 Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation Oct 9, 2023 Action Recognition Image Generation
Code Code Available 45 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks May 28, 2019 Action Recognition Domain Generalization
Code Code Available 35 MotionBERT: A Unified Perspective on Learning Human Motion Representations Oct 12, 2022 3D Human Pose Estimation 3D Pose Estimation
Code Code Available 35 Humans in 4D: Reconstructing and Tracking Humans with Transformers May 31, 2023 3D Human Pose Estimation Action Recognition
Code Code Available 35 Expanding Language-Image Pretrained Models for General Video Recognition Aug 4, 2022 Action Classification Action Recognition
Code Code Available 35 VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training Mar 23, 2022 4k Action Classification
Code Code Available 35 Harnessing Temporal Causality for Advanced Temporal Action Detection Jul 25, 2024 Action Detection Action Recognition
Code Code Available 35 A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications Jun 2, 2022 Action Recognition Sports Analytics
Code Code Available 35 Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models Jan 30, 2025 Action Recognition Domain Adaptation
Code Code Available 35 Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects Mar 25, 2024 Action Recognition Motion Generation
Code Code Available 35 Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings Mar 25, 2025 4k Action Recognition
Code Code Available 25 Revealing Single Frame Bias for Video-and-Language Learning Jun 7, 2022 Action Recognition Fine-grained Action Recognition
Code Code Available 25 Temporal Action Detection with Structured Segment Networks Apr 20, 2017 Action Detection Action Recognition
Code Code Available 25 Omnivore: A Single Model for Many Visual Modalities Jan 20, 2022 Action Classification Action Recognition
Code Code Available 25 Omni-sourced Webly-supervised Learning for Video Recognition Mar 29, 2020 Action Classification Action Recognition
Code Code Available 25 On the Benefits of 3D Pose and Tracking for Human Action Recognition Apr 3, 2023 Action Recognition Temporal Action Localization
Code Code Available 25 Temporal Segment Networks for Action Recognition in Videos May 8, 2017 Action Classification Action Recognition
Code Code Available 25 LLaVAction: evaluating and training multi-modal large language models for action recognition Mar 24, 2025 Action Recognition Action Understanding
Code Code Available 25 Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data Feb 8, 2024 Action Recognition Mamba
Code Code Available 25 Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction? No, Let's Improve It With Action-union Learning Oct 22, 2023 Action Recognition Action Segmentation
Code Code Available 25 OmniVid: A Generative Framework for Universal Video Understanding Mar 26, 2024 Action Recognition Decoder
Code Code Available 25 Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba May 9, 2024 Action Recognition Mamba
Code Code Available 25 Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition Nov 28, 2024 Action Recognition Skeleton Based Action Recognition
Code Code Available 25 Hulk: A Universal Knowledge Translator for Human-Centric Tasks Dec 4, 2023 3D Human Pose Estimation Action Recognition
Code Code Available 25 SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition Mar 14, 2024 Action Recognition Human Interaction Recognition
Code Code Available 25 Frozen Transformers in Language Models Are Effective Visual Encoder Layers Oct 19, 2023 Action Recognition Image-text Retrieval
Code Code Available 25 FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition Feb 5, 2024 Action Recognition Open Vocabulary Action Recognition
Code Code Available 25 Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment Mar 20, 2024 Action Quality Assessment Action Quality Assessment Report Generation
Code Code Available 25 Learning Spatiotemporal Features with 3D Convolutional Networks Dec 2, 2014 Action Recognition Action Recognition In Videos
Code Code Available 25 Dynamic 3D Point Cloud Sequences as 2D Videos Mar 2, 2024 Action Recognition Self-Supervised Learning
Code Code Available 25 Deep Architectures for Content Moderation and Movie Content Rating Dec 8, 2022 Action Recognition Genre classification
Code Code Available 25 AIM: Adapting Image Models for Efficient Video Action Recognition Feb 6, 2023 Action Classification Action Recognition
Code Code Available 25 DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition Mar 25, 2024 Action Recognition Skeleton Based Action Recognition
Code Code Available 25 EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation Jun 26, 2024 Action Anticipation Action Recognition
Code Code Available 25 Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models Dec 31, 2022 Action Classification Action Recognition
Code Code Available 25 AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition May 26, 2022 Action Recognition Video Recognition
Code Code Available 25 HAKE: A Knowledge Engine Foundation for Human Activity Understanding Feb 14, 2022 Action Recognition Human-Object Interaction Detection
Code Code Available 25 BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition Jan 1, 2024 Action Recognition Skeleton Based Action Recognition
Code Code Available 25 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation Jul 5, 2024 Action Recognition Few-Shot Image Classification
Code Code Available 25 ActionFormer: Localizing Moments of Actions with Transformers Feb 16, 2022 Action Localization Action Recognition
Code Code Available 25 Is Space-Time Attention All You Need for Video Understanding? Feb 9, 2021 Action Classification Action Recognition
Code Code Available 25 Learning Video Representations from Large Language Models Dec 8, 2022 Action Classification Action Recognition
Code Code Available 25 Leveraging Temporal Contextualization for Video Action Recognition Apr 15, 2024 Action Recognition Temporal Action Localization
Code Code Available 25 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? Nov 27, 2017 Action Recognition
Code Code Available 25 Egocentric Video-Language Pretraining Jun 3, 2022 Action Recognition Contrastive Learning
Code Code Available 25