| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 |
| Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations | Oct 10, 2024 | Time Series ForecastingVideo Recognition | CodeCode Available | 5 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |
| DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | May 30, 2024 | DeepFake DetectionMamba | CodeCode Available | 2 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | Dec 31, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device | Sep 27, 2021 | Video RecognitionVideo Understanding | CodeCode Available | 2 |
| X3D: Expanding Architectures for Efficient Video Recognition | Apr 9, 2020 | Action Classificationfeature selection | CodeCode Available | 2 |
| Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition | Dec 15, 2024 | Computational EfficiencyVideo Recognition | CodeCode Available | 2 |
| Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs? | Apr 10, 2020 | General ClassificationOpen-Ended Question Answering | CodeCode Available | 2 |
| Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | Jul 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition | May 26, 2022 | Action RecognitionVideo Recognition | CodeCode Available | 2 |
| Omni-sourced Webly-supervised Learning for Video Recognition | Mar 29, 2020 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| Video Swin Transformer | Jun 24, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| In Defense of Image Pre-Training for Spatiotemporal Recognition | May 3, 2022 | GPUSTS | CodeCode Available | 1 |
| Large Scale Holistic Video Understanding | Apr 25, 2019 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| MViTv2: Improved Multiscale Vision Transformers for Classification and Detection | Dec 2, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Group Contextualization for Video Recognition | Mar 18, 2022 | Action RecognitionEgocentric Activity Recognition | CodeCode Available | 1 |
| Glance and Focus Networks for Dynamic Visual Recognition | Jan 9, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Helping Hands: An Object-Aware Ego-Centric Video Recognition Model | Aug 15, 2023 | DecoderObject | CodeCode Available | 1 |
| Learning Equivariant Representations | Dec 4, 2020 | 3D Shape ClassificationGeneral Classification | CodeCode Available | 1 |
| Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers | Aug 25, 2023 | Action RecognitionObject Detection | CodeCode Available | 1 |
| Frozen CLIP Models are Efficient Video Learners | Aug 6, 2022 | Action ClassificationDecoder | CodeCode Available | 1 |
| Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation | Jul 9, 2020 | Few-Shot Image ClassificationFew-Shot Learning | CodeCode Available | 1 |
| Adapting Short-Term Transformers for Action Detection in Untrimmed Videos | Dec 4, 2023 | Action DetectionVideo Recognition | CodeCode Available | 1 |
| Audio-Visual Class-Incremental Learning | Aug 21, 2023 | class-incremental learningClass Incremental Learning | CodeCode Available | 1 |
| Clean-Label Backdoor Attacks on Video Recognition Models | Mar 6, 2020 | Backdoor Attackbackdoor defense | CodeCode Available | 1 |
| Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Apr 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Adaptive Focus for Efficient Video Recognition | May 7, 2021 | Computational EfficiencyGPU | CodeCode Available | 1 |
| Improved Residual Networks for Image and Video Recognition | Apr 10, 2020 | Action Recognitionimage-classification | CodeCode Available | 1 |
| AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition | May 11, 2021 | Video Recognition | CodeCode Available | 1 |
| CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture Recognition | Apr 20, 2020 | Gesture RecognitionLifelong learning | CodeCode Available | 1 |
| DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition | Dec 9, 2021 | Video Recognition | CodeCode Available | 1 |
| Efficient Movie Scene Detection using State-Space Transformers | Dec 29, 2022 | GPUScene Segmentation | CodeCode Available | 1 |
| Fast Differentiable Matrix Square Root and Inverse Square Root | Jan 29, 2022 | Style TransferVideo Recognition | CodeCode Available | 1 |
| Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data | Oct 8, 2023 | Action RecognitionContinual Learning | CodeCode Available | 1 |
| Can An Image Classifier Suffice For Action Recognition? | Jun 26, 2021 | Action Recognitionimage-classification | CodeCode Available | 1 |
| DEVIAS: Learning Disentangled Video Representations of Action and Scene | Nov 30, 2023 | Action RecognitionDecoder | CodeCode Available | 1 |
| DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning | May 25, 2021 | Action RecognitionLong-range modeling | CodeCode Available | 1 |
| Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning | Nov 30, 2021 | 3D Human Pose EstimationCamera Calibration | CodeCode Available | 1 |
| Boosting the Transferability of Video Adversarial Examples via Temporal Translation | Oct 18, 2021 | Adversarial AttackTranslation | CodeCode Available | 1 |
| Dynamic Network Quantization for Efficient Video Inference | Aug 23, 2021 | QuantizationVideo Recognition | CodeCode Available | 1 |
| AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition | Sep 27, 2022 | Video Recognition | CodeCode Available | 1 |
| Clockwork Convnets for Video Semantic Segmentation | Aug 11, 2016 | Image SegmentationScheduling | CodeCode Available | 1 |
| Cluster and Aggregate: Face Recognition with Large Probe Set | Oct 19, 2022 | Face RecognitionFace Verification | CodeCode Available | 1 |
| Efficient Video Transformers with Spatial-Temporal Token Selection | Nov 23, 2021 | Video Recognition | CodeCode Available | 1 |
| Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition | Oct 20, 2020 | Action RecognitionFew Shot Action Recognition | CodeCode Available | 1 |
| Frame Flexible Network | Mar 26, 2023 | Video Recognition | CodeCode Available | 1 |
| Attacking Video Recognition Models with Bullet-Screen Comments | Oct 29, 2021 | Adversarial AttackAdversarial Attack on Video Classification | CodeCode Available | 1 |
| Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning | Sep 14, 2023 | Transfer LearningVideo Recognition | CodeCode Available | 1 |