COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Nov 1, 2020 Cross-Modal Retrieval Representation Learning
Code Code Available 1Self-supervised Co-training for Video Representation Learning Oct 19, 2020 Action Recognition Contrastive Learning
Code Code Available 1Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 1COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 1Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation Jul 9, 2020 Few-Shot Image Classification Few-Shot Learning
Code Code Available 1AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Jun 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1CoVR-2: Automatic Data Construction for Composed Video Retrieval Aug 28, 2023 Composed Image Retrieval (CoIR) Composed Video Retrieval (CoVR)
Code Code Available 1Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting Jun 18, 2021 Action Recognition Action Recognition In Videos
Code Code Available 1Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data Mar 14, 2022 Articles Retrieval
Code Code Available 1GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval May 22, 2024 Partially Relevant Video Retrieval Retrieval
Code Code Available 1Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval Dec 15, 2023 All Image Retrieval
Code Code Available 1Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning Nov 27, 2023 Action Classification Action Recognition
Code Code Available 1VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation Jun 8, 2021 Multi-Task Learning Question Answering
Code Code Available 1Cross-Architecture Self-supervised Video Representation Learning May 26, 2022 Action Recognition Contrastive Learning
Code Code Available 1Cross-Modal Adapter for Text-Video Retrieval Nov 17, 2022 parameter-efficient fine-tuning Retrieval
Code Code Available 1StableFusion: Continual Video Retrieval via Frame Adaptation Mar 13, 2025 Continual Learning Mixture-of-Experts
Code Code Available 1Cross Modal Retrieval with Querybank Normalisation Dec 23, 2021 Cross-Modal Retrieval Metric Learning
Code Code Available 1HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training May 1, 2020 Language Modeling Language Modelling
Code Code Available 1An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 1Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 1DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization Jun 1, 2021 Question Answering Retrieval
Code Code Available 1Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 1Holistic Features are almost Sufficient for Text-to-Video Retrieval Jan 1, 2024 Retrieval text similarity
Code Code Available 1Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 1Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data Oct 8, 2023 Action Recognition Continual Learning
Code Code Available 1HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Jun 7, 2019 Action Localization Long Video Retrieval (Background Removed)
Code Code Available 1HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 1Event-aware Video Corpus Moment Retrieval Feb 21, 2024 Contrastive Learning Moment Retrieval
— Unverified 0CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing Jan 22, 2024 AudioCaps Audio-Visual Synchronization
— Unverified 0Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models Apr 29, 2024 Image Retrieval Language Modeling
— Unverified 0Enhanced Multimodal Representation Learning with Cross-modal KD Jun 13, 2023 Contrastive Learning Emotion Classification
— Unverified 0ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency Jun 4, 2021 Action Recognition Representation Learning
— Unverified 0End-to-end Generative Pretraining for Multimodal Video Captioning Jan 20, 2022 Action Classification Decoder
— Unverified 0Coarse to Fine: Video Retrieval before Moment Localization Oct 14, 2021 Moment Retrieval Retrieval
— Unverified 0End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering Oct 10, 2016 Language Modeling Language Modelling
— Unverified 0Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval Sep 30, 2020 Retrieval Video Retrieval
— Unverified 0CNN Retrieval based Unsupervised Metric Learning for Near-Duplicated Video Retrieval May 30, 2021 Metric Learning Re-Ranking
— Unverified 0MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding Jun 7, 2023 Retrieval Sentence
— Unverified 0Empowering Agentic Video Analytics Systems with Video Language Models May 1, 2025 Knowledge Graphs RAG
— Unverified 0Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures Jun 15, 2016 Clustering Retrieval
— Unverified 0CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture May 3, 2025 Autonomous Driving Benchmarking
— Unverified 0A Review of Deep Learning for Video Captioning Apr 22, 2023 Deep Learning Dense Video Captioning
— Unverified 0Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract May 10, 2019 Image Retrieval Retrieval
— Unverified 0Action in Mind: A Neural Network Approach to Action Recognition and Segmentation Apr 30, 2021 Action Recognition Action Segmentation
— Unverified 0Efficient Action Detection in Untrimmed Videos via Multi-Task Learning Dec 22, 2016 Action Detection Action Localization
— Unverified 0CLOP: Video-and-Language Pre-Training with Knowledge Regularizations Nov 7, 2022 Contrastive Learning Retrieval
— Unverified 0MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed Jun 11, 2025 Retrieval Video Retrieval
— Unverified 0Masked Contrastive Pre-Training for Efficient Video-Text Retrieval Dec 2, 2022 Image-text Retrieval Retrieval
— Unverified 0A Proposal-based Approach for Activity Image-to-Video Retrieval Nov 24, 2019 Cross-Modal Retrieval Retrieval
— Unverified 0EA-VTR: Event-Aware Video-Text Retrieval Jul 10, 2024 Action Recognition Contrastive Learning
— Unverified 0