| ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video | Oct 2, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning | Sep 14, 2023 | Transfer LearningVideo Recognition | CodeCode Available | 1 |
| Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer | Sep 11, 2023 | Multi-Task LearningVideo Recognition | —Unverified | 0 |
| Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving | Sep 8, 2023 | AllAutonomous Driving | —Unverified | 0 |
| Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers | Aug 25, 2023 | Action RecognitionObject Detection | CodeCode Available | 1 |
| Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition | Aug 22, 2023 | Multiview LearningVideo Recognition | CodeCode Available | 0 |
| Audio-Visual Class-Incremental Learning | Aug 21, 2023 | class-incremental learningClass Incremental Learning | CodeCode Available | 1 |
| Temporal-Distributed Backdoor Attack Against Video Based Action Recognition | Aug 21, 2023 | Action RecognitionBackdoor Attack | —Unverified | 0 |
| Audio-Visual Glance Network for Efficient Video Recognition | Aug 18, 2023 | Video RecognitionVideo Understanding | —Unverified | 0 |
| Helping Hands: An Object-Aware Ego-Centric Video Recognition Model | Aug 15, 2023 | DecoderObject | CodeCode Available | 1 |
| Orthogonal Temporal Interpolation for Zero-Shot Video Recognition | Aug 14, 2023 | Video RecognitionZero-Shot Action Recognition | CodeCode Available | 0 |
| On the Importance of Spatial Relations for Few-shot Action Recognition | Aug 14, 2023 | Action RecognitionFew-Shot action recognition | —Unverified | 0 |
| View while Moving: Efficient Video Recognition in Long-untrimmed Videos | Aug 9, 2023 | Video Recognition | —Unverified | 0 |
| Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation | Aug 8, 2023 | Video Recognition | CodeCode Available | 1 |
| What Can Simple Arithmetic Operations Do for Temporal Modeling? | Jul 18, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition | Jul 13, 2023 | Action RecognitionTemporal Action Localization | CodeCode Available | 1 |
| TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter | Jun 22, 2023 | Question AnsweringRetrieval | CodeCode Available | 0 |
| Enhanced Multimodal Representation Learning with Cross-modal KD | Jun 13, 2023 | Contrastive LearningEmotion Classification | —Unverified | 0 |
| A two-way translation system of Chinese sign language based on computer vision | Jun 3, 2023 | SentenceSign Language Recognition | —Unverified | 0 |
| Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | Jun 1, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition | May 22, 2023 | Action RecognitionDecoder | —Unverified | 0 |
| Inter-frame Accelerate Attack against Video Interpolation Models | May 11, 2023 | Adversarial RobustnessVideo Frame Interpolation | —Unverified | 0 |
| Multi-object Video Generation from Single Frame Layouts | May 6, 2023 | Image GenerationObject | —Unverified | 0 |
| Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Apr 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Use Your Head: Improving Long-Tail Video Recognition | Apr 3, 2023 | Video Recognition | CodeCode Available | 0 |
| Frame Flexible Network | Mar 26, 2023 | Video Recognition | CodeCode Available | 1 |
| The effectiveness of MAE pre-pretraining for billion-scale pretraining | Mar 23, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Efficient Decision-based Black-box Patch Attacks on Video Recognition | Mar 21, 2023 | Video Recognition | —Unverified | 0 |
| Video Action Recognition with Attentive Semantic Units | Mar 17, 2023 | Action RecognitionDecoder | —Unverified | 0 |
| MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge | Mar 15, 2023 | Action RecognitionFew-Shot action recognition | CodeCode Available | 1 |
| Making Vision Transformers Efficient from A Token Sparsification View | Mar 15, 2023 | Efficient ViTsimage-classification | CodeCode Available | 1 |
| MRET: Multi-resolution Transformer for Video Quality Assessment | Mar 13, 2023 | Video Quality AssessmentVideo Recognition | —Unverified | 0 |
| Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition | Mar 5, 2023 | Action RecognitionComputational Efficiency | CodeCode Available | 0 |
| Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks | Feb 24, 2023 | ClassificationData Augmentation | —Unverified | 0 |
| Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization | Feb 1, 2023 | Action RecognitionContinual Learning | CodeCode Available | 1 |
| Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring | Jan 26, 2023 | Representation LearningRetrieval | CodeCode Available | 1 |
| Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos | Jan 3, 2023 | Action RecognitionAdversarial Robustness | CodeCode Available | 0 |
| Tiny Updater: Towards Efficient Neural Network-Driven Software Updating | Jan 1, 2023 | Efficient Neural Networkimage-classification | CodeCode Available | 0 |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | Dec 31, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Efficient Movie Scene Detection using State-Space Transformers | Dec 29, 2022 | GPUScene Segmentation | CodeCode Available | 1 |
| Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition | Dec 5, 2022 | Tensor DecompositionVideo Recognition | —Unverified | 0 |
| VLG: General Video Recognition with Web Textual Knowledge | Dec 3, 2022 | Video Recognition | CodeCode Available | 1 |
| SVFormer: Semi-supervised Video Transformer for Action Recognition | Nov 23, 2022 | Action Recognitionimage-classification | CodeCode Available | 1 |
| Look More but Care Less in Video Recognition | Nov 18, 2022 | Action RecognitionVideo Recognition | CodeCode Available | 1 |
| Temporal superimposed crossover module for effective continuous sign language | Nov 7, 2022 | image-classificationImage Classification | CodeCode Available | 0 |
| Cluster and Aggregate: Face Recognition with Large Probe Set | Oct 19, 2022 | Face RecognitionFace Verification | CodeCode Available | 1 |
| Towards a Unified View on Visual Parameter-Efficient Transfer Learning | Oct 3, 2022 | Action RecognitionImage Classification | CodeCode Available | 1 |
| REST: REtrieve & Self-Train for generative action recognition | Sep 29, 2022 | Action RecognitionCaption Generation | —Unverified | 0 |
| AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition | Sep 27, 2022 | Video Recognition | CodeCode Available | 1 |
| Rethinking Resolution in the Context of Efficient Video Recognition | Sep 26, 2022 | Knowledge DistillationVideo Recognition | CodeCode Available | 1 |