Perception Encoder: The best visual embeddings are not at the output of the network Apr 17, 2025 Depth Estimation Language Modeling
Code Code Available 8Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Jul 12, 2023 Fairness Image Classification
Code Code Available 6A Survey on Visual Mamba Apr 24, 2024 Image Registration Image Restoration
Code Code Available 4MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Apr 8, 2024 GPU Multiple-choice
Code Code Available 3Gramian Multimodal Representation Learning and Alignment Dec 16, 2024 Contrastive Learning Representation Learning
Code Code Available 2DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark May 30, 2024 DeepFake Detection Mamba
Code Code Available 2Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning Feb 9, 2024 Active Learning Video Classification
Code Code Available 2Revisiting Classifier: Transferring Vision-Language Models for Video Recognition Jul 4, 2022 Action Classification Action Recognition
Code Code Available 2Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs Jun 9, 2022 Image Captioning Image Classification
Code Code Available 2UniFormer: Unifying Convolution and Self-attention for Visual Recognition Jan 24, 2022 Image Classification object-detection
Code Code Available 2Video Swin Transformer Jun 24, 2021 Action Classification Action Recognition
Code Code Available 2Is Space-Time Attention All You Need for Video Understanding? Feb 9, 2021 Action Classification Action Recognition
Code Code Available 2Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs? Apr 10, 2020 General Classification Open-Ended Question Answering
Code Code Available 2X3D: Expanding Architectures for Efficient Video Recognition Apr 9, 2020 Action Classification feature selection
Code Code Available 2Temporal Segment Networks for Action Recognition in Videos May 8, 2017 Action Classification Action Recognition
Code Code Available 2Video-GPT via Next Clip Diffusion May 18, 2025 Denoising Image Animation
Code Code Available 1When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis Jan 17, 2025 Large Language Model Multimodal Large Language Model
Code Code Available 1HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics Aug 30, 2024 Form Video Classification
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili Jul 28, 2024 Hate Speech Detection Video Classification
Code Code Available 1X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization Mar 28, 2024 Video Classification Zero-Shot Learning
Code Code Available 1InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 1Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach Dec 21, 2023 Action Localization Classification
Code Code Available 1A Simple Video Segmenter by Tracking Objects Along Axial Trajectories Nov 30, 2023 GPU Object
Code Code Available 1Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments Nov 10, 2023 Activity Recognition Autonomous Driving
Code Code Available 1MUVF-YOLOX: A Multi-modal Ultrasound Video Fusion Network for Renal Tumor Diagnosis Jul 15, 2023 Video Classification
Code Code Available 1HateMM: A Multi-Modal Dataset for Hate Video Classification May 6, 2023 Classification Hate Speech Detection
Code Code Available 1SparseFormer: Sparse Visual Recognition via Limited Latent Tokens Apr 7, 2023 Image Classification Sparse Representation-based Classification
Code Code Available 1Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting Apr 6, 2023 Action Recognition Prompt Learning
Code Code Available 1The effectiveness of MAE pre-pretraining for billion-scale pretraining Mar 23, 2023 Action Classification Action Recognition
Code Code Available 1Towards Activated Muscle Group Estimation in the Wild Mar 2, 2023 Activity Recognition Human Activity Recognition
Code Code Available 1Reversible Vision Transformers Feb 9, 2023 GPU image-classification
Code Code Available 1Efficient Movie Scene Detection using State-Space Transformers Dec 29, 2022 GPU Scene Segmentation
Code Code Available 1A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition Nov 16, 2022 Action Recognition Data Augmentation
Code Code Available 1Overlooked Video Classification in Weakly Supervised Video Anomaly Detection Oct 13, 2022 All Anomaly Detection
Code Code Available 1TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance Sep 26, 2022 image-classification Image Classification
Code Code Available 1SSIVD-Net: A Novel Salient Super Image Classification & Detection Technique for Weaponized Violence Jul 26, 2022 Action Recognition image-classification
Code Code Available 1Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments Jul 21, 2022 General Classification Video Classification
Code Code Available 1Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset Jul 21, 2022 Fine-Grained Visual Categorization Video Classification
Code Code Available 1Temporal and cross-modal attention for audio-visual zero-shot learning Jul 20, 2022 GZSL Video Classification Video Classification
Code Code Available 1Key-frame Guided Network for Thyroid Nodule Recognition using Ultrasound Videos Jun 27, 2022 Video Classification
Code Code Available 1Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization May 1, 2022 Action Localization Data Augmentation
Code Code Available 1Attention in Attention: Modeling Context Correlation for Efficient Video Classification Apr 20, 2022 Video Classification
Code Code Available 1Long Movie Clip Classification with State-Space Video Models Apr 4, 2022 Classification Decoder
Code Code Available 1StyleFool: Fooling Video Classification Systems via Style Transfer Mar 30, 2022 Adversarial Attack Classification
Code Code Available 1Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification Mar 29, 2022 Representation Learning Video Classification
Code Code Available 1Unsupervised Pre-training for Temporal Action Localization Tasks Mar 25, 2022 Action Localization Contrastive Learning
Code Code Available 1A Dataset for Medical Instructional Video Classification and Question Answering Jan 30, 2022 Classification Question Answering
Code Code Available 1Learning To Recognize Procedural Activities with Distant Supervision Jan 26, 2022 Action Classification Language Modelling
Code Code Available 1Progressive Video Summarization via Multimodal Self-supervised Learning Jan 7, 2022 Self-Supervised Learning Supervised Video Summarization
Code Code Available 1