| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 |
| VideoMamba: State Space Model for Efficient Video Understanding | Mar 11, 2024 | Action ClassificationMamba | CodeCode Available | 5 |
| mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video | Feb 1, 2023 | Action ClassificationImage Classification | CodeCode Available | 4 |
| InternVideo: General Video Foundation Models via Generative and Discriminative Learning | Dec 6, 2022 | Action ClassificationAction Recognition | CodeCode Available | 4 |
| VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training | Mar 23, 2022 | 4kAction Classification | CodeCode Available | 3 |
| ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | May 18, 2023 | 1 Image, 2*2 StitchiAction Classification | CodeCode Available | 3 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |
| Towards Universal Soccer Video Understanding | Dec 2, 2024 | Action ClassificationSports Understanding | CodeCode Available | 3 |
| Is Space-Time Attention All You Need for Video Understanding? | Feb 9, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Omni-sourced Webly-supervised Learning for Video Recognition | Mar 29, 2020 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | Jul 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Video Swin Transformer | Jun 24, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Temporal Segment Networks for Action Recognition in Videos | May 8, 2017 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Temporal Segment Networks: Towards Good Practices for Deep Action Recognition | Aug 2, 2016 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Omnivore: A Single Model for Many Visual Modalities | Jan 20, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Learning Video Representations from Large Language Models | Dec 8, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| AIM: Adapting Image Models for Efficient Video Action Recognition | Feb 6, 2023 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer | Sep 22, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| X3D: Expanding Architectures for Efficient Video Recognition | Apr 9, 2020 | Action Classificationfeature selection | CodeCode Available | 2 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | Dec 31, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| MARLIN: Masked Autoencoder for facial video Representation LearnINg | Nov 12, 2022 | Action ClassificationAttribute | CodeCode Available | 2 |
| VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking | Mar 29, 2023 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Infrared and 3D skeleton feature fusion for RGB-D action recognition | Feb 28, 2020 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily Living | Nov 30, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| High Quality Monocular Depth Estimation via Transfer Learning | Dec 31, 2018 | Action ClassificationDecoder | CodeCode Available | 1 |
| A Closer Look at Spatiotemporal Convolutions for Action Recognition | Nov 30, 2017 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| MViTv2: Improved Multiscale Vision Transformers for Classification and Detection | Dec 2, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders | Nov 16, 2022 | Action ClassificationRepresentation Learning | CodeCode Available | 1 |
| Large Scale Holistic Video Understanding | Apr 25, 2019 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Frozen CLIP Models are Efficient Video Learners | Aug 6, 2022 | Action ClassificationDecoder | CodeCode Available | 1 |
| Weakly-supervised Temporal Action Localization by Uncertainty Modeling | Jun 12, 2020 | Action ClassificationAction Localization | CodeCode Available | 1 |
| HierVL: Learning Hierarchical Video-Language Embeddings | Jan 5, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Apr 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers | Jun 9, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Enriching Local and Global Contexts for Temporal Action Localization | Jul 27, 2021 | Action ClassificationAction Localization | CodeCode Available | 1 |
| EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding | Jun 13, 2024 | Action ClassificationAction Localization | CodeCode Available | 1 |
| Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless Sensing | Mar 19, 2024 | Action ClassificationDeep Learning | CodeCode Available | 1 |
| An Image is Worth 16x16 Words, What is a Video Worth? | Mar 25, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| CT-Net: Channel Tensorization Network for Video Classification | Jun 3, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| An Evaluation of Action Recognition Models on EPIC-Kitchens | Aug 2, 2019 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| An Empirical Study of End-to-End Temporal Action Detection | Apr 6, 2022 | Action ClassificationAction Detection | CodeCode Available | 1 |
| CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network | Aug 20, 2024 | Action ClassificationAction Classification (1-shot) | CodeCode Available | 1 |
| Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing | Sep 30, 2020 | Action ClassificationVideo Recognition | CodeCode Available | 1 |
| Dual-path Adaptation from Image to Video Transformers | Mar 17, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks | Nov 14, 2021 | Action ClassificationObject | CodeCode Available | 1 |
| EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition | Aug 10, 2024 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| AViD Dataset: Anonymized Videos from Diverse Countries | Jul 10, 2020 | Action ClassificationAction Detection | CodeCode Available | 1 |
| BABEL: Bodies, Action and Behavior with English Labels | Jun 17, 2021 | 3D Action RecognitionAction Classification | CodeCode Available | 1 |
| Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition | Nov 8, 2024 | Action ClassificationActivity Recognition | CodeCode Available | 1 |
| DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition | Mar 19, 2022 | Action ClassificationAction Recognition | CodeCode Available | 1 |