| Adapting Short-Term Transformers for Action Detection in Untrimmed Videos | Dec 4, 2023 | Action DetectionVideo Recognition | CodeCode Available | 1 |
| Sharing Pain: Using Pain Domain Transfer for Video Recognition of Low Grade Orthopedic Pain in Horses | May 21, 2021 | Action RecognitionFine-grained Action Recognition | CodeCode Available | 1 |
| Audio-Visual Class-Incremental Learning | Aug 21, 2023 | class-incremental learningClass Incremental Learning | CodeCode Available | 1 |
| Space-time Mixing Attention for Video Transformer | Jun 10, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Helping Hands: An Object-Aware Ego-Centric Video Recognition Model | Aug 15, 2023 | DecoderObject | CodeCode Available | 1 |
| Deep Feature Flow for Video Recognition | Nov 23, 2016 | Video RecognitionVideo Semantic Segmentation | CodeCode Available | 1 |
| The effectiveness of MAE pre-pretraining for billion-scale pretraining | Mar 23, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| TokenLearner: Adaptive Space-Time Tokenization for Videos | Dec 1, 2021 | Representation LearningVideo Recognition | CodeCode Available | 1 |
| Learning Versatile Neural Architectures by Propagating Network Codes | Mar 24, 2021 | Image SegmentationNeural Architecture Search | CodeCode Available | 1 |
| Frozen CLIP Models are Efficient Video Learners | Aug 6, 2022 | Action ClassificationDecoder | CodeCode Available | 1 |
| AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition | Sep 27, 2022 | Video Recognition | CodeCode Available | 1 |
| Frame Flexible Network | Mar 26, 2023 | Video Recognition | CodeCode Available | 1 |
| Long Movie Clip Classification with State-Space Video Models | Apr 4, 2022 | ClassificationDecoder | CodeCode Available | 1 |
| MViTv2: Improved Multiscale Vision Transformers for Classification and Detection | Dec 2, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation | Mar 26, 2025 | Video Recognition | CodeCode Available | 1 |
| Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing | Sep 30, 2020 | Action ClassificationVideo Recognition | CodeCode Available | 1 |
| AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition | Dec 28, 2021 | Computational EfficiencyDiversity | CodeCode Available | 1 |
| Improved Residual Networks for Image and Video Recognition | Apr 10, 2020 | Action Recognitionimage-classification | CodeCode Available | 1 |
| Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning | Sep 14, 2023 | Transfer LearningVideo Recognition | CodeCode Available | 1 |
| Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Apr 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| DEVIAS: Learning Disentangled Video Representations of Action and Scene | Nov 30, 2023 | Action RecognitionDecoder | CodeCode Available | 1 |
| DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning | May 25, 2021 | Action RecognitionLong-range modeling | CodeCode Available | 1 |
| DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition | Dec 9, 2021 | Video Recognition | CodeCode Available | 1 |
| Boosting the Transferability of Video Adversarial Examples via Temporal Translation | Oct 18, 2021 | Adversarial AttackTranslation | CodeCode Available | 1 |
| Dynamic Network Quantization for Efficient Video Inference | Aug 23, 2021 | QuantizationVideo Recognition | CodeCode Available | 1 |
| Glance and Focus Networks for Dynamic Visual Recognition | Jan 9, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data | Oct 8, 2023 | Action RecognitionContinual Learning | CodeCode Available | 1 |
| Learning Equivariant Representations | Dec 4, 2020 | 3D Shape ClassificationGeneral Classification | CodeCode Available | 1 |
| Making Vision Transformers Efficient from A Token Sparsification View | Mar 15, 2023 | Efficient ViTsimage-classification | CodeCode Available | 1 |
| Efficient Movie Scene Detection using State-Space Transformers | Dec 29, 2022 | GPUScene Segmentation | CodeCode Available | 1 |
| Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning | Nov 30, 2021 | 3D Human Pose EstimationCamera Calibration | CodeCode Available | 1 |
| MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge | Mar 15, 2023 | Action RecognitionFew-Shot action recognition | CodeCode Available | 1 |
| Efficient Video Transformers with Spatial-Temporal Token Selection | Nov 23, 2021 | Video Recognition | CodeCode Available | 1 |
| CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture Recognition | Apr 20, 2020 | Gesture RecognitionLifelong learning | CodeCode Available | 1 |
| Multiscale Vision Transformers | Apr 22, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| MVFNet: Multi-View Fusion Network for Efficient Video Recognition | Dec 13, 2020 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers | Aug 25, 2023 | Action RecognitionObject Detection | CodeCode Available | 1 |
| Clean-Label Backdoor Attacks on Video Recognition Models | Mar 6, 2020 | Backdoor Attackbackdoor defense | CodeCode Available | 1 |
| Adversarial Bipartite Graph Learning for Video Domain Adaptation | Jul 31, 2020 | Domain AdaptationGraph Learning | CodeCode Available | 1 |
| Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition | Oct 20, 2020 | Action RecognitionFew Shot Action Recognition | CodeCode Available | 1 |
| Cluster and Aggregate: Face Recognition with Large Probe Set | Oct 19, 2022 | Face RecognitionFace Verification | CodeCode Available | 1 |
| Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation | Jul 9, 2020 | Few-Shot Image ClassificationFew-Shot Learning | CodeCode Available | 1 |
| Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks | Feb 12, 2020 | Action ClassificationClassification | CodeCode Available | 1 |
| PAVE: Patching and Adapting Video Large Language Models | Mar 25, 2025 | Audio-visual Question AnsweringMulti-Task Learning | CodeCode Available | 1 |
| FrameExit: Conditional Early Exiting for Efficient Video Recognition | Apr 27, 2021 | Video RecognitionVideo Understanding | CodeCode Available | 1 |
| Fast Differentiable Matrix Square Root and Inverse Square Root | Jan 29, 2022 | Style TransferVideo Recognition | CodeCode Available | 1 |
| Real-time Online Video Detection with Temporal Smoothing Transformers | Sep 19, 2022 | Action AnticipationAction Detection | CodeCode Available | 1 |
| Rethinking Resolution in the Context of Efficient Video Recognition | Sep 26, 2022 | Knowledge DistillationVideo Recognition | CodeCode Available | 1 |
| Large Scale Holistic Video Understanding | Apr 25, 2019 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| In Defense of Image Pre-Training for Spatiotemporal Recognition | May 3, 2022 | GPUSTS | CodeCode Available | 1 |