Perception Encoder: The best visual embeddings are not at the output of the network Apr 17, 2025 Depth Estimation Language Modeling
Code Code Available 85 Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Jul 12, 2023 Fairness Image Classification
Code Code Available 65 A Survey on Visual Mamba Apr 24, 2024 Image Registration Image Restoration
Code Code Available 45 MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Apr 8, 2024 GPU Multiple-choice
Code Code Available 35 DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark May 30, 2024 DeepFake Detection Mamba
Code Code Available 25 Video Swin Transformer Jun 24, 2021 Action Classification Action Recognition
Code Code Available 25 UniFormer: Unifying Convolution and Self-attention for Visual Recognition Jan 24, 2022 Image Classification object-detection
Code Code Available 25 Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning Feb 9, 2024 Active Learning Video Classification
Code Code Available 25 Gramian Multimodal Representation Learning and Alignment Dec 16, 2024 Contrastive Learning Representation Learning
Code Code Available 25 Revisiting Classifier: Transferring Vision-Language Models for Video Recognition Jul 4, 2022 Action Classification Action Recognition
Code Code Available 25 Is Space-Time Attention All You Need for Video Understanding? Feb 9, 2021 Action Classification Action Recognition
Code Code Available 25 Temporal Segment Networks for Action Recognition in Videos May 8, 2017 Action Classification Action Recognition
Code Code Available 25 Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs Jun 9, 2022 Image Captioning Image Classification
Code Code Available 25 Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs? Apr 10, 2020 General Classification Open-Ended Question Answering
Code Code Available 25 X3D: Expanding Architectures for Efficient Video Recognition Apr 9, 2020 Action Classification feature selection
Code Code Available 25 A Simple Video Segmenter by Tracking Objects Along Axial Trajectories Nov 30, 2023 GPU Object
Code Code Available 15 Deep Temporal Linear Encoding Networks Nov 21, 2016 Representation Learning Video Classification
Code Code Available 15 Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior Mar 17, 2020 Adversarial Attack Video Classification
Code Code Available 15 Learning To Recognize Procedural Activities with Distant Supervision Jan 26, 2022 Action Classification Language Modelling
Code Code Available 15 Learning Implicit Temporal Alignment for Few-shot Video Classification May 11, 2021 Action Recognition In Videos Classification
Code Code Available 15 Long Movie Clip Classification with State-Space Video Models Apr 4, 2022 Classification Decoder
Code Code Available 15 MotionSqueeze: Neural Motion Feature Learning for Video Understanding Jul 20, 2020 Action Classification Action Recognition
Code Code Available 15 Convolutional Spiking Neural Networks for Spatio-Temporal Feature Extraction Mar 27, 2020 Activity Recognition In Videos Event data classification
Code Code Available 15 Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization May 1, 2022 Action Localization Data Augmentation
Code Code Available 15 Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks Nov 28, 2017 Action Recognition Philosophy
Code Code Available 15 CT-Net: Channel Tensorization Network for Video Classification Jun 3, 2021 Action Classification Action Recognition
Code Code Available 15 A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark Oct 24, 2021 Classification Meta-Learning
Code Code Available 15 A Multigrid Method for Efficiently Training Video Models Dec 2, 2019 Action Detection Action Recognition
Code Code Available 15 Making a Case for 3D Convolutions for Object Segmentation in Videos Aug 26, 2020 Decoder Segmentation
Code Code Available 15 HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics Aug 30, 2024 Form Video Classification
Code Code Available 15 Is normalization indispensable for training deep neural network? Dec 1, 2020 General Classification image-classification
Code Code Available 15 Key-frame Guided Network for Thyroid Nodule Recognition using Ultrasound Videos Jun 27, 2022 Video Classification
Code Code Available 15 MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Dec 2, 2021 Action Classification Action Recognition
Code Code Available 15 Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 15 Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments Jul 21, 2022 General Classification Video Classification
Code Code Available 15 Large-Scale Video Classification with Convolutional Neural Networks Jun 23, 2014 Action Recognition Classification
Code Code Available 15 Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset Jul 21, 2022 Fine-Grained Visual Categorization Video Classification
Code Code Available 15 A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition Nov 16, 2022 Action Recognition Data Augmentation
Code Code Available 15 Billion-scale semi-supervised learning for image classification May 2, 2019 Classification General Classification
Code Code Available 15 Approximated Bilinear Modules for Temporal Modeling Jul 25, 2020 Action Recognition Video Classification
Code Code Available 15 A Spatio-temporal Attention-based Model for Infant Movement Assessment from Videos May 20, 2021 Video Classification
Code Code Available 15 Home Action Genome: Cooperative Compositional Action Understanding May 11, 2021 Action Recognition Action Understanding
Code Code Available 15 EEG-based Emotional Video Classification via Learning Connectivity Structure May 28, 2019 Classification EEG
Code Code Available 15 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 15 Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation Jul 9, 2020 Few-Shot Image Classification Few-Shot Learning
Code Code Available 15 Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks Sep 17, 2020 Inductive Bias Object
Code Code Available 15 Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 15 Adaptive Token Sampling For Efficient Vision Transformers Nov 30, 2021 Efficient ViTs image-classification
Code Code Available 15 Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification Mar 29, 2022 Representation Learning Video Classification
Code Code Available 15 A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games Aug 25, 2021 Benchmarking Video Classification
Code Code Available 15