| Adapting Short-Term Transformers for Action Detection in Untrimmed Videos | Dec 4, 2023 | Action DetectionVideo Recognition | CodeCode Available | 1 | 5 |
| MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition | Jan 20, 2022 | Action AnticipationAction Classification | CodeCode Available | 1 | 5 |
| Audio-Visual Class-Incremental Learning | Aug 21, 2023 | class-incremental learningClass Incremental Learning | CodeCode Available | 1 | 5 |
| 0-MMS: Zero-Shot Multi-Motion Segmentation With A Monocular Event Camera | Jun 11, 2020 | Motion CompensationMotion Segmentation | CodeCode Available | 1 | 5 |
| Multiscale Vision Transformers | Apr 22, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Deep Feature Flow for Video Recognition | Nov 23, 2016 | Video RecognitionVideo Semantic Segmentation | CodeCode Available | 1 | 5 |
| Pooling by Sliced-Wasserstein Embedding | Dec 1, 2021 | Graph Learningimage-classification | CodeCode Available | 1 | 5 |
| No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding | May 14, 2024 | Action DetectionGPU | CodeCode Available | 1 | 5 |
| PAVE: Patching and Adapting Video Large Language Models | Mar 25, 2025 | Audio-visual Question AnsweringMulti-Task Learning | CodeCode Available | 1 | 5 |
| Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation | Jul 9, 2020 | Few-Shot Image ClassificationFew-Shot Learning | CodeCode Available | 1 | 5 |
| Piano Skills Assessment | Jan 13, 2021 | Action Quality AssessmentAudio Classification | CodeCode Available | 1 | 5 |
| Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation | Aug 8, 2023 | Video Recognition | CodeCode Available | 1 | 5 |
| OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | Nov 30, 2023 | DescriptiveLanguage Modelling | CodeCode Available | 1 | 5 |
| BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation | Mar 26, 2025 | Video Recognition | CodeCode Available | 1 | 5 |
| Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing | Sep 30, 2020 | Action ClassificationVideo Recognition | CodeCode Available | 1 | 5 |
| AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition | Dec 28, 2021 | Computational EfficiencyDiversity | CodeCode Available | 1 | 5 |
| PatchNet -- Short-range Template Matching for Efficient Video Processing | Mar 10, 2021 | Objectobject-detection | CodeCode Available | 1 | 5 |
| Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning | Sep 14, 2023 | Transfer LearningVideo Recognition | CodeCode Available | 1 | 5 |
| DEVIAS: Learning Disentangled Video Representations of Action and Scene | Nov 30, 2023 | Action RecognitionDecoder | CodeCode Available | 1 | 5 |
| Frozen CLIP Models are Efficient Video Learners | Aug 6, 2022 | Action ClassificationDecoder | CodeCode Available | 1 | 5 |
| AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition | Sep 27, 2022 | Video Recognition | CodeCode Available | 1 | 5 |
| DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning | May 25, 2021 | Action RecognitionLong-range modeling | CodeCode Available | 1 | 5 |
| DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition | Dec 9, 2021 | Video Recognition | CodeCode Available | 1 | 5 |
| Boosting the Transferability of Video Adversarial Examples via Temporal Translation | Oct 18, 2021 | Adversarial AttackTranslation | CodeCode Available | 1 | 5 |
| Dynamic Network Quantization for Efficient Video Inference | Aug 23, 2021 | QuantizationVideo Recognition | CodeCode Available | 1 | 5 |
| FrameExit: Conditional Early Exiting for Efficient Video Recognition | Apr 27, 2021 | Video RecognitionVideo Understanding | CodeCode Available | 1 | 5 |
| Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data | Oct 8, 2023 | Action RecognitionContinual Learning | CodeCode Available | 1 | 5 |
| Glance and Focus Networks for Dynamic Visual Recognition | Jan 9, 2022 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| In Defense of Image Pre-Training for Spatiotemporal Recognition | May 3, 2022 | GPUSTS | CodeCode Available | 1 | 5 |
| Efficient Movie Scene Detection using State-Space Transformers | Dec 29, 2022 | GPUScene Segmentation | CodeCode Available | 1 | 5 |
| Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning | Nov 30, 2021 | 3D Human Pose EstimationCamera Calibration | CodeCode Available | 1 | 5 |
| Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition | Jun 20, 2020 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Efficient Video Transformers with Spatial-Temporal Token Selection | Nov 23, 2021 | Video Recognition | CodeCode Available | 1 | 5 |
| CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture Recognition | Apr 20, 2020 | Gesture RecognitionLifelong learning | CodeCode Available | 1 | 5 |
| MViTv2: Improved Multiscale Vision Transformers for Classification and Detection | Dec 2, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Apr 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Adversarial Bipartite Graph Learning for Video Domain Adaptation | Jul 31, 2020 | Domain AdaptationGraph Learning | CodeCode Available | 1 | 5 |
| Clean-Label Backdoor Attacks on Video Recognition Models | Mar 6, 2020 | Backdoor Attackbackdoor defense | CodeCode Available | 1 | 5 |
| Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition | Oct 20, 2020 | Action RecognitionFew Shot Action Recognition | CodeCode Available | 1 | 5 |
| Frame Flexible Network | Mar 26, 2023 | Video Recognition | CodeCode Available | 1 | 5 |
| Cluster and Aggregate: Face Recognition with Large Probe Set | Oct 19, 2022 | Face RecognitionFace Verification | CodeCode Available | 1 | 5 |
| Learning Equivariant Representations | Dec 4, 2020 | 3D Shape ClassificationGeneral Classification | CodeCode Available | 1 | 5 |
| Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework | Aug 6, 2020 | Action Recognition In VideosContrastive Learning | CodeCode Available | 1 | 5 |
| Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting | Jun 18, 2021 | Action RecognitionAction Recognition In Videos | CodeCode Available | 1 | 5 |
| Helping Hands: An Object-Aware Ego-Centric Video Recognition Model | Aug 15, 2023 | DecoderObject | CodeCode Available | 1 | 5 |
| Fast Differentiable Matrix Square Root and Inverse Square Root | Jan 29, 2022 | Style TransferVideo Recognition | CodeCode Available | 1 | 5 |
| Look More but Care Less in Video Recognition | Nov 18, 2022 | Action RecognitionVideo Recognition | CodeCode Available | 1 | 5 |
| Making Vision Transformers Efficient from A Token Sparsification View | Mar 15, 2023 | Efficient ViTsimage-classification | CodeCode Available | 1 | 5 |
| Attacking Video Recognition Models with Bullet-Screen Comments | Oct 29, 2021 | Adversarial AttackAdversarial Attack on Video Classification | CodeCode Available | 1 | 5 |
| Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks | Feb 12, 2020 | Action ClassificationClassification | CodeCode Available | 1 | 5 |