Learning video retrieval models with relevance-aware online mining Mar 16, 2022 Multi-Instance Retrieval Retrieval
Code Code Available 15 DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval Jun 24, 2021 Computational Efficiency Knowledge Distillation
Code Code Available 15 Referring Atomic Video Action Recognition Jul 2, 2024 Action Localization Action Recognition
Code Code Available 15 Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval Dec 3, 2021 Ad-hoc video search feature selection
Code Code Available 15 Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval Jan 1, 2023 Knowledge Distillation Language Modelling
Code Code Available 15 Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning Sep 20, 2023 Contrastive Learning Retrieval
Code Code Available 15 CLIP2Video: Mastering Video-Text Retrieval via Image CLIP Jun 21, 2021 Language Modeling Language Modelling
Code Code Available 15 LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts May 20, 2025 Caption Generation Retrieval
Code Code Available 15 ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Apr 6, 2022 Retrieval Text to Video Retrieval
Code Code Available 15 CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Apr 18, 2021 Retrieval Text Retrieval
Code Code Available 15 Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval Jan 23, 2022 Representation Learning Retrieval
Code Code Available 15 Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring Jan 26, 2023 Representation Learning Retrieval
Code Code Available 15 A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension May 5, 2023 Reading Comprehension Retrieval
Code Code Available 15 Memory-augmented Dense Predictive Coding for Video Representation Learning Aug 3, 2020 Action Classification Action Recognition
Code Code Available 15 EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval Jul 23, 2024 Re-Ranking Retrieval
Code Code Available 15 HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training May 1, 2020 Language Modeling Language Modelling
Code Code Available 15 CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval Sep 21, 2021 Corpus Video Moment Retrieval Moment Retrieval
Code Code Available 15 MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval Apr 26, 2022 Action Recognition Retrieval
Code Code Available 15 Align and Prompt: Video-and-Language Pre-training with Entity Prompts Dec 17, 2021 cross-modal alignment Entity Alignment
Code Code Available 15 Multimedia Retrieval Through Unsupervised Hypergraph-Based Manifold Ranking Dec 1, 2019 Content-Based Image Retrieval Retrieval
Code Code Available 15 Multi-modal Transformer for Video Retrieval Jul 21, 2020 Natural Language Queries Retrieval
Code Code Available 15 End-to-End Learning of Visual Representations from Uncurated Instructional Videos Dec 13, 2019 Action Localization Action Recognition
Code Code Available 15 AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant Nov 30, 2021 Question Answering Retrieval
Code Code Available 15 MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval Aug 20, 2024 Mamba Natural Language Queries
Code Code Available 15 CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 15 Normalized Contrastive Learning for Text-Video Retrieval Nov 30, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 15 Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval Aug 15, 2023 Retrieval Video Captioning
Code Code Available 15 Revisiting the "Video" in Video-Language Understanding Jun 3, 2022 Benchmarking Question Answering
Code Code Available 15 Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation Jul 9, 2020 Few-Shot Image Classification Few-Shot Learning
Code Code Available 15 Condensed Movies: Story Based Retrieval with Contextual Embeddings May 8, 2020 Retrieval Text to Video Retrieval
Code Code Available 15 Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 15 Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations Nov 21, 2022 Contrastive Learning Representation Learning
Code Code Available 15 3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video Retrieval Nov 10, 2022 Retrieval Self-Supervised Learning
Code Code Available 15 Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval Jan 1, 2023 Diversity Object
Code Code Available 15 Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval Jan 1, 2022 Action Localization Retrieval
Code Code Available 15 GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval Oct 8, 2023 Partially Relevant Video Retrieval Retrieval
Code Code Available 15 Temporal Context Aggregation for Video Retrieval with Contrastive Learning Aug 4, 2020 Contrastive Learning Representation Learning
Code Code Available 15 GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval May 22, 2024 Partially Relevant Video Retrieval Retrieval
Code Code Available 15 Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning Oct 17, 2020 Retrieval Transfer Learning
Code Code Available 15 Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval Sep 29, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 15 Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval Dec 8, 2021 Action Localization Retrieval
Code Code Available 15 A Straightforward Framework For Video Retrieval Using CLIP Feb 24, 2021 Retrieval Video Retrieval
Code Code Available 15 Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions Nov 19, 2021 Retrieval Super-Resolution
Code Code Available 15 Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 15 Contrastive Masked Autoencoders for Self-Supervised Video Hashing Nov 21, 2022 Decoder Retrieval
Code Code Available 15 Holistic Features are almost Sufficient for Text-to-Video Retrieval Jan 1, 2024 Retrieval text similarity
Code Code Available 15 HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Jun 7, 2019 Action Localization Long Video Retrieval (Background Removed)
Code Code Available 15 Florence: A New Foundation Model for Computer Vision Nov 22, 2021 Action Classification Action Recognition
Code Code Available 15 HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 15 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 15