Perception Encoder: The best visual embeddings are not at the output of the network Apr 17, 2025 Depth Estimation Language Modeling
Code Code Available 8Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Jul 12, 2023 Fairness Image Classification
Code Code Available 6A Survey on Visual Mamba Apr 24, 2024 Image Registration Image Restoration
Code Code Available 4MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Apr 8, 2024 GPU Multiple-choice
Code Code Available 3Revisiting Classifier: Transferring Vision-Language Models for Video Recognition Jul 4, 2022 Action Classification Action Recognition
Code Code Available 2X3D: Expanding Architectures for Efficient Video Recognition Apr 9, 2020 Action Classification feature selection
Code Code Available 2Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs Jun 9, 2022 Image Captioning Image Classification
Code Code Available 2Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs? Apr 10, 2020 General Classification Open-Ended Question Answering
Code Code Available 2Video Swin Transformer Jun 24, 2021 Action Classification Action Recognition
Code Code Available 2UniFormer: Unifying Convolution and Self-attention for Visual Recognition Jan 24, 2022 Image Classification object-detection
Code Code Available 2Gramian Multimodal Representation Learning and Alignment Dec 16, 2024 Contrastive Learning Representation Learning
Code Code Available 2Temporal Segment Networks for Action Recognition in Videos May 8, 2017 Action Classification Action Recognition
Code Code Available 2Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning Feb 9, 2024 Active Learning Video Classification
Code Code Available 2DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark May 30, 2024 DeepFake Detection Mamba
Code Code Available 2Is Space-Time Attention All You Need for Video Understanding? Feb 9, 2021 Action Classification Action Recognition
Code Code Available 2A Simple Video Segmenter by Tracking Objects Along Axial Trajectories Nov 30, 2023 GPU Object
Code Code Available 1Long Movie Clip Classification with State-Space Video Models Apr 4, 2022 Classification Decoder
Code Code Available 1Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior Mar 17, 2020 Adversarial Attack Video Classification
Code Code Available 1Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks Nov 28, 2017 Action Recognition Philosophy
Code Code Available 1Large-Scale Video Classification with Convolutional Neural Networks Jun 23, 2014 Action Recognition Classification
Code Code Available 1Learning To Recognize Procedural Activities with Distant Supervision Jan 26, 2022 Action Classification Language Modelling
Code Code Available 1MotionSqueeze: Neural Motion Feature Learning for Video Understanding Jul 20, 2020 Action Classification Action Recognition
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1Key-frame Guided Network for Thyroid Nodule Recognition using Ultrasound Videos Jun 27, 2022 Video Classification
Code Code Available 1Active Contrastive Learning of Audio-Visual Video Representations Aug 31, 2020 Contrastive Learning Representation Learning
Code Code Available 1Learning Implicit Temporal Alignment for Few-shot Video Classification May 11, 2021 Action Recognition In Videos Classification
Code Code Available 1A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark Oct 24, 2021 Classification Meta-Learning
Code Code Available 1A Multigrid Method for Efficiently Training Video Models Dec 2, 2019 Action Detection Action Recognition
Code Code Available 1Making a Case for 3D Convolutions for Object Segmentation in Videos Aug 26, 2020 Decoder Segmentation
Code Code Available 1MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Dec 2, 2021 Action Classification Action Recognition
Code Code Available 1Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 1Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments Jul 21, 2022 General Classification Video Classification
Code Code Available 1Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation Jul 9, 2020 Few-Shot Image Classification Few-Shot Learning
Code Code Available 1A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games Aug 25, 2021 Benchmarking Video Classification
Code Code Available 1Efficient Movie Scene Detection using State-Space Transformers Dec 29, 2022 GPU Scene Segmentation
Code Code Available 1InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 1Deep Temporal Linear Encoding Networks Nov 21, 2016 Representation Learning Video Classification
Code Code Available 1CT-Net: Channel Tensorization Network for Video Classification Jun 3, 2021 Action Classification Action Recognition
Code Code Available 1Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization May 1, 2022 Action Localization Data Augmentation
Code Code Available 1Approximated Bilinear Modules for Temporal Modeling Jul 25, 2020 Action Recognition Video Classification
Code Code Available 1A Spatio-temporal Attention-based Model for Infant Movement Assessment from Videos May 20, 2021 Video Classification
Code Code Available 1Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset Jul 21, 2022 Fine-Grained Visual Categorization Video Classification
Code Code Available 1A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition Nov 16, 2022 Action Recognition Data Augmentation
Code Code Available 1HateMM: A Multi-Modal Dataset for Hate Video Classification May 6, 2023 Classification Hate Speech Detection
Code Code Available 1Home Action Genome: Cooperative Compositional Action Understanding May 11, 2021 Action Recognition Action Understanding
Code Code Available 1Compact Generalized Non-local Network Oct 31, 2018 Object Detection Object Recognition
Code Code Available 1Convolutional Spiking Neural Networks for Spatio-Temporal Feature Extraction Mar 27, 2020 Activity Recognition In Videos Event data classification
Code Code Available 1Adaptive Token Sampling For Efficient Vision Transformers Nov 30, 2021 Efficient ViTs image-classification
Code Code Available 1Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification Mar 29, 2022 Representation Learning Video Classification
Code Code Available 1Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification Dec 1, 2020 3D Architecture Action Recognition
Code Code Available 1