COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Nov 1, 2020 Cross-Modal Retrieval Representation Learning
Code Code Available 15 LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Jun 14, 2022 Decoder Language Modeling
Code Code Available 15 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 15 COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 15 Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation Jul 9, 2020 Few-Shot Image Classification Few-Shot Learning
Code Code Available 15 AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Jun 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 CoVR-2: Automatic Data Construction for Composed Video Retrieval Aug 28, 2023 Composed Image Retrieval (CoIR) Composed Video Retrieval (CoVR)
Code Code Available 15 Multi-modal Transformer for Video Retrieval Jul 21, 2020 Natural Language Queries Retrieval
Code Code Available 15 GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval Oct 8, 2023 Partially Relevant Video Retrieval Retrieval
Code Code Available 15 GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval May 22, 2024 Partially Relevant Video Retrieval Retrieval
Code Code Available 15 MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval Aug 20, 2024 Mamba Natural Language Queries
Code Code Available 15 Multi-Query Video Retrieval Jan 10, 2022 Retrieval Video Retrieval
Code Code Available 15 TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval Sep 2, 2024 GPU Retrieval
Code Code Available 15 Cross-Architecture Self-supervised Video Representation Learning May 26, 2022 Action Recognition Contrastive Learning
Code Code Available 15 TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition May 4, 2022 Action Recognition Representation Learning
Code Code Available 15 Normalized Contrastive Learning for Text-Video Retrieval Nov 30, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 15 Cross Modal Retrieval with Querybank Normalisation Dec 23, 2021 Cross-Modal Retrieval Metric Learning
Code Code Available 15 HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training May 1, 2020 Language Modeling Language Modelling
Code Code Available 15 An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 15 Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 15 DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization Jun 1, 2021 Question Answering Retrieval
Code Code Available 15 Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 15 Holistic Features are almost Sufficient for Text-to-Video Retrieval Jan 1, 2024 Retrieval text similarity
Code Code Available 15 Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 15 VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling Nov 24, 2021 Question Answering Retrieval
Code Code Available 15 HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Jun 7, 2019 Action Localization Long Video Retrieval (Background Removed)
Code Code Available 15 HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 15 Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs) Mar 21, 2025 Representation Learning Retrieval
Code Code Available 05 T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval Apr 20, 2021 Retrieval Video Retrieval
Code Code Available 05 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation Jul 20, 2018 Face Generation Lip Reading
Code Code Available 05 Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Apr 1, 2022 Diversity Image Captioning
Code Code Available 05 Semantic Role Aware Correlation Transformer for Text to Video Retrieval Jun 26, 2022 Retrieval Text to Video Retrieval
Code Code Available 05 Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval Nov 21, 2022 All Retrieval
Code Code Available 05 Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer Feb 4, 2023 Computational Efficiency Question Answering
Code Code Available 05 Efficient Cross-Modal Video Retrieval with Meta-Optimized Frames Oct 16, 2022 Bilevel Optimization Retrieval
Code Code Available 05 Accommodating Audio Modality in CLIP for Multimodal Processing Mar 12, 2023 AudioCaps Contrastive Learning
Code Code Available 05 Video Logo Retrieval based on local Features Aug 11, 2018 Image Retrieval Retrieval
Code Code Available 05 ECO: Efficient Convolutional Network for Online Video Understanding Apr 24, 2018 Action Classification Action Recognition
Code Code Available 05 A Joint Sequence Fusion Model for Video Question Answering and Retrieval Aug 7, 2018 Decoder Multiple-choice
Code Code Available 05 Self-supervised Video Representation Learning with Cascade Positive Retrieval Jan 20, 2022 Action Recognition Contrastive Learning
Code Code Available 05 Dual Encoding for Zero-Example Video Retrieval Sep 17, 2018 Ad-hoc video search Retrieval
Code Code Available 05 Self-supervised Video Representation Learning by Context and Motion Decoupling Apr 2, 2021 Action Recognition CPU
Code Code Available 05 SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries Nov 24, 2020 Ad-hoc video search Management
Code Code Available 05 SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval Jul 23, 2024 Retrieval Sign Language Retrieval
Code Code Available 05 LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers Jun 1, 2018 Copy Detection Retrieval
Code Code Available 05 Discriminative Residual Analysis for Image Set Classification with Posture and Age Variations Aug 23, 2020 General Classification Retrieval
Code Code Available 05 Circulant temporal encoding for video retrieval and temporal alignment Jun 8, 2015 Retrieval Video Retrieval
Code Code Available 05 Central Similarity Quantization for Efficient Image and Video Retrieval Aug 1, 2019 Quantization Retrieval
Code Code Available 05 Rudder: A Cross Lingual Video and Text Retrieval Dataset Mar 9, 2021 Natural Language Queries Retrieval
Code Code Available 05 Joint Searching and Grounding: Multi-Granularity Video Content Retrieval Oct 23, 2023 Contrastive Learning Retrieval
Code Code Available 05