Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos Feb 11, 2025 Contrastive Learning Image Retrieval
— Unverified 0HORUS: Multimodal Large Language Models Framework for Video Retrieval at VBS 2025 Jan 1, 2025 Image Retrieval Retrieval
— Unverified 0CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval Dec 31, 2024 Retrieval Text Retrieval
— Unverified 0Hierarchical Banzhaf Interaction for General Video-Language Representation Learning Dec 30, 2024 Contrastive Learning Question Answering
Code Code Available 0PolySmart @ TRECVid 2024 Medical Video Question Answering Dec 20, 2024 Question Answering Retrieval
— Unverified 0Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning Dec 18, 2024 Moment Retrieval Multi-Task Learning
— Unverified 0Generative Semantic Communication: Architectures, Technologies, and Applications Dec 11, 2024 Retrieval Semantic Communication
— Unverified 0Multimodal Contextualized Support for Enhancing Video Retrieval System Dec 10, 2024 object-detection Object Detection
— Unverified 0ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising Oct 29, 2024 Retrieval Text to Video Retrieval
Code Code Available 0Generating Signed Language Instructions in Large-Scale Dialogue Systems Oct 17, 2024 Retrieval Text Generation
Code Code Available 0MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval Oct 15, 2024 Descriptive Retrieval
— Unverified 0VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Oct 1, 2024 Hallucination text similarity
— Unverified 0TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm Sep 30, 2024 Retrieval Video Retrieval
Code Code Available 0Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding Sep 29, 2024 Diversity Question Answering
— Unverified 0Unfolding Videos Dynamics via Taylor Expansion Sep 4, 2024 Action Detection Action Recognition
— Unverified 0Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets Sep 2, 2024 Video Alignment Video Editing
— Unverified 0NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality Aug 18, 2024 Retrieval Text Retrieval
— Unverified 0Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach Aug 14, 2024 Cross-Modal Retrieval Language Modeling
— Unverified 0Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics Aug 5, 2024 Retrieval Video Retrieval
— Unverified 0Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce Aug 1, 2024 Graph Matching Retrieval
— Unverified 0ExpertAF: Expert Actionable Feedback from Video Aug 1, 2024 Language Modeling Language Modelling
— Unverified 0SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval Jul 23, 2024 Retrieval Sign Language Retrieval
Code Code Available 0Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval Jul 22, 2024 All Retrieval
— Unverified 0MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline Jul 17, 2024 Question Answering Retrieval
— Unverified 0EA-VTR: Event-Aware Video-Text Retrieval Jul 10, 2024 Action Recognition Contrastive Learning
— Unverified 0MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning Jul 4, 2024 Language Modeling Language Modelling
Code Code Available 0ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling Jun 25, 2024 Cross-Modal Retrieval Natural Language Queries
— Unverified 0Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval Jun 21, 2024 Retrieval Sentence
— Unverified 0Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset Jun 19, 2024 Language Modeling Language Modelling
— Unverified 0RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model Jun 2, 2024 Action Recognition Temporal Action Localization
— Unverified 0Uncertainty-aware sign language video retrieval with probability distribution modeling May 30, 2024 Retrieval Sign Language Retrieval
— Unverified 0RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter May 29, 2024 Natural Language Queries parameter-efficient fine-tuning
— Unverified 0Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models Apr 29, 2024 Image Retrieval Language Modeling
— Unverified 0Learning text-to-video retrieval from image captioning Apr 26, 2024 Image Captioning Image Retrieval
— Unverified 0SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval Apr 22, 2024 Retrieval Video Retrieval
— Unverified 0ProTA: Probabilistic Token Aggregation for Text-Video Retrieval Apr 18, 2024 Diversity Retrieval
— Unverified 0Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval Mar 26, 2024 Multimodal Reasoning Retrieval
— Unverified 0Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement Feb 21, 2024 Moment Retrieval Retrieval
Code Code Available 0Event-aware Video Corpus Moment Retrieval Feb 21, 2024 Contrastive Learning Moment Retrieval
— Unverified 0Video Editing for Video Retrieval Feb 4, 2024 Retrieval Text Retrieval
— Unverified 0CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing Jan 22, 2024 AudioCaps Audio-Visual Synchronization
— Unverified 0Distilling Vision-Language Models on Millions of Videos Jan 11, 2024 Language Modeling Language Modelling
— Unverified 0Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks Jan 6, 2024 Retrieval Variational Inference
— Unverified 0Detours for Navigating Instructional Videos Jan 3, 2024 16k Question Answering
— Unverified 0No More Shortcuts: Realizing the Potential of Temporal Self-Supervision Dec 20, 2023 Action Classification Attribute
— Unverified 0WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge Dec 15, 2023 Information Retrieval Knowledge Distillation
Code Code Available 0Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning Dec 10, 2023 Language Modeling Language Modelling
— Unverified 0Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval Dec 1, 2023 Image Retrieval Partially Relevant Video Retrieval
— Unverified 0A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval Nov 30, 2023 Benchmarking Retrieval
— Unverified 0Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding Nov 30, 2023 Form Video Retrieval
— Unverified 0