SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training Nov 21, 2022 cross-modal alignment GPU
— Unverified 0Contrastive Masked Autoencoders for Self-Supervised Video Hashing Nov 21, 2022 Decoder Retrieval
Code Code Available 1A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset Nov 19, 2022 Common Sense Reasoning Graph Embedding
— Unverified 0Cross-Modal Adapter for Text-Video Retrieval Nov 17, 2022 parameter-efficient fine-tuning Retrieval
Code Code Available 13D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval Nov 10, 2022 Retrieval Self-Supervised Learning
Code Code Available 1CLOP: Video-and-Language Pre-Training with Knowledge Regularizations Nov 7, 2022 Contrastive Learning Retrieval
— Unverified 0LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling Oct 21, 2022 Language Modeling Language Modelling
— Unverified 0Efficient Cross-Modal Video Retrieval with Meta-Optimized Frames Oct 16, 2022 Bilevel Optimization Retrieval
Code Code Available 0Semantic Video Moments Retrieval at Scale: A New Task and a Baseline Oct 15, 2022 Retrieval Video Retrieval
— Unverified 0RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval Oct 13, 2022 Contrastive Learning Retrieval
Code Code Available 0Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Oct 12, 2022 Contrastive Learning Form
Code Code Available 2Learning to Locate Visual Answer in Video Corpus Using Question Oct 11, 2022 Contrastive Learning Language Modelling
Code Code Available 0Contrastive Video-Language Learning with Fine-grained Frame Sampling Oct 10, 2022 Question Answering Representation Learning
— Unverified 0Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks Oct 10, 2022 Retrieval Text to Video Retrieval
— Unverified 0ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval Oct 9, 2022 Retrieval Sentence
Code Code Available 0C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval Oct 7, 2022 Knowledge Distillation Retrieval
Code Code Available 1Event Extraction in Video Transcripts Oct 1, 2022 Articles Event Extraction
— Unverified 0TVLT: Textless Vision-Language Transformer Sep 28, 2022 Automatic Speech Recognition (ASR) Image Retrieval
Code Code Available 1Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval Sep 27, 2022 Cross-Modal Retrieval Retrieval
— Unverified 0Multi-Granularity Graph Pooling for Video-based Person Re-Identification Sep 23, 2022 Node Clustering Person Re-Identification
— Unverified 0Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network Sep 23, 2022 Person Re-Identification Retrieval
— Unverified 0Marine Video Kit: A New Marine Video Dataset for Content-based Analysis and Retrieval Sep 23, 2022 Retrieval Video Retrieval
Code Code Available 1Semi-automatic Data Annotation System for Multi-Target Multi-Camera Vehicle Tracking Sep 20, 2022 Retrieval Video Retrieval
— Unverified 0Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising Sep 19, 2022 Image Retrieval Retrieval
— Unverified 0OmniVL:One Foundation Model for Image-Language and Video-Language Tasks Sep 15, 2022 Action Classification Action Recognition
— Unverified 0CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment Sep 14, 2022 Retrieval Text Retrieval
Code Code Available 2An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 1Temporal Contrastive Learning with Curriculum Sep 2, 2022 Action Recognition Contrastive Learning
— Unverified 0Partially Relevant Video Retrieval Aug 26, 2022 Moment Retrieval Multiple Instance Learning
Code Code Available 1MuMUR : Multilingual Multimodal Universal Retrieval Aug 24, 2022 Image Retrieval Machine Translation
— Unverified 0STAR-GNN: Spatial-Temporal Video Representation for Content-based Retrieval Aug 15, 2022 Graph Neural Network Representation Learning
— Unverified 0Motion Sensitive Contrastive Learning for Self-supervised Video Representation Aug 12, 2022 Contrastive Learning Representation Learning
— Unverified 0QSAM-Net: Rain streak removal by quaternion neural network with self-attention module Aug 8, 2022 Benchmarking object-detection
— Unverified 0A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval Aug 3, 2022 Data Augmentation Retrieval
Code Code Available 1LocVTP: Video-Text Pre-training for Temporal Localization Jul 21, 2022 Retrieval Temporal Localization
Code Code Available 1GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning Jul 20, 2022 Action Recognition Clustering
Code Code Available 0TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval Jul 16, 2022 Retrieval Video Retrieval
Code Code Available 1Clover: Towards A Unified Video-Language Alignment and Fusion Model Jul 16, 2022 Language Modeling Language Modelling
Code Code Available 1X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval Jul 15, 2022 Contrastive Learning Retrieval
Code Code Available 1LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval Jul 11, 2022 Representation Learning Retrieval
— Unverified 0Robustness Analysis of Video-Language Models Against Visual and Language Perturbations Jul 5, 2022 Language Modeling Language Modelling
Code Code Available 0Exploiting Semantic Role Contextualized Video Features for Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022 Jun 29, 2022 Multi-Instance Retrieval Retrieval
Code Code Available 0Semantic Role Aware Correlation Transformer for Text to Video Retrieval Jun 26, 2022 Retrieval Text to Video Retrieval
Code Code Available 0RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval Jun 26, 2022 Mixture-of-Experts Retrieval
Code Code Available 0SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos Jun 25, 2022 Action Classification Clustering
Code Code Available 1LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Jun 14, 2022 Decoder Language Modeling
Code Code Available 1Revealing Single Frame Bias for Video-and-Language Learning Jun 7, 2022 Action Recognition Fine-grained Action Recognition
Code Code Available 2Revisiting the "Video" in Video-Language Understanding Jun 3, 2022 Benchmarking Question Answering
Code Code Available 1Cross-Architecture Self-supervised Video Representation Learning May 26, 2022 Action Recognition Contrastive Learning
Code Code Available 1VRAG: Region Attention Graphs for Content-Based Video Retrieval May 18, 2022 Retrieval Video Retrieval
— Unverified 0