NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality Aug 18, 2024 Retrieval Text Retrieval
— Unverified 0Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach Aug 14, 2024 Cross-Modal Retrieval Language Modeling
— Unverified 0Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics Aug 5, 2024 Retrieval Video Retrieval
— Unverified 0Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce Aug 1, 2024 Graph Matching Retrieval
— Unverified 0ExpertAF: Expert Actionable Feedback from Video Aug 1, 2024 Language Modeling Language Modelling
— Unverified 0SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval Jul 23, 2024 Retrieval Sign Language Retrieval
Code Code Available 0EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval Jul 23, 2024 Re-Ranking Retrieval
Code Code Available 1Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval Jul 22, 2024 All Retrieval
— Unverified 0MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline Jul 17, 2024 Question Answering Retrieval
— Unverified 0EA-VTR: Event-Aware Video-Text Retrieval Jul 10, 2024 Action Recognition Contrastive Learning
— Unverified 0MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning Jul 4, 2024 Language Modeling Language Modelling
Code Code Available 0Referring Atomic Video Action Recognition Jul 2, 2024 Action Localization Action Recognition
Code Code Available 1ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling Jun 25, 2024 Cross-Modal Retrieval Natural Language Queries
— Unverified 0Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval Jun 21, 2024 Retrieval Sentence
— Unverified 0Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset Jun 19, 2024 Language Modeling Language Modelling
— Unverified 0Explore the Limits of Omni-modal Pretraining at Scale Jun 13, 2024 Language Modeling Language Modelling
Code Code Available 2RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model Jun 2, 2024 Action Recognition Temporal Action Localization
— Unverified 0Uncertainty-aware sign language video retrieval with probability distribution modeling May 30, 2024 Retrieval Sign Language Retrieval
— Unverified 0RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter May 29, 2024 Natural Language Queries parameter-efficient fine-tuning
— Unverified 0GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval May 22, 2024 Partially Relevant Video Retrieval Retrieval
Code Code Available 1Text-Video Retrieval with Global-Local Semantic Consistent Learning May 21, 2024 Concept Alignment Retrieval
Code Code Available 1Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models Apr 29, 2024 Image Retrieval Language Modeling
— Unverified 0Learning text-to-video retrieval from image captioning Apr 26, 2024 Image Captioning Image Retrieval
— Unverified 0SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval Apr 22, 2024 Retrieval Video Retrieval
— Unverified 0ProTA: Probabilistic Token Aggregation for Text-Video Retrieval Apr 18, 2024 Diversity Retrieval
— Unverified 0Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval Mar 26, 2024 Multimodal Reasoning Retrieval
— Unverified 0Composed Video Retrieval via Enriched Context and Discriminative Embeddings Mar 25, 2024 Composed Video Retrieval (CoVR) Retrieval
Code Code Available 2EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World Mar 24, 2024 Action Anticipation Action Quality Assessment
Code Code Available 2InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Mar 22, 2024 Action Classification Action Recognition
Code Code Available 7vid-TLDR: Training Free Token merging for Light-weight Video Transformer Mar 20, 2024 Action Recognition Computational Efficiency
Code Code Available 2Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement Feb 21, 2024 Moment Retrieval Retrieval
Code Code Available 0Event-aware Video Corpus Moment Retrieval Feb 21, 2024 Contrastive Learning Moment Retrieval
— Unverified 0Video Editing for Video Retrieval Feb 4, 2024 Retrieval Text Retrieval
— Unverified 0Multi-granularity Correspondence Learning from Long-term Noisy Videos Jan 30, 2024 Action Segmentation Long Video Retrieval (Background Removed)
Code Code Available 2CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing Jan 22, 2024 AudioCaps Audio-Visual Synchronization
— Unverified 0DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval Jan 19, 2024 Retrieval Video Retrieval
Code Code Available 1Distilling Vision-Language Models on Millions of Videos Jan 11, 2024 Language Modeling Language Modelling
— Unverified 0Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks Jan 6, 2024 Retrieval Variational Inference
— Unverified 0Detours for Navigating Instructional Videos Jan 3, 2024 16k Question Answering
— Unverified 0Holistic Features are almost Sufficient for Text-to-Video Retrieval Jan 1, 2024 Retrieval text similarity
Code Code Available 1Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning Jan 1, 2024 Representation Learning Retrieval
Code Code Available 1No More Shortcuts: Realizing the Potential of Temporal Self-Supervision Dec 20, 2023 Action Classification Attribute
— Unverified 0Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Dec 16, 2023 Video Captioning video narration captioning
Code Code Available 1Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval Dec 15, 2023 All Image Retrieval
Code Code Available 1WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge Dec 15, 2023 Information Retrieval Knowledge Distillation
Code Code Available 0Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning Dec 10, 2023 Language Modeling Language Modelling
— Unverified 0Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval Dec 1, 2023 Image Retrieval Partially Relevant Video Retrieval
— Unverified 0RTQ: Rethinking Video-language Understanding Based on Image-text Model Dec 1, 2023 Video Captioning Video Question Answering
Code Code Available 1A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval Nov 30, 2023 Benchmarking Retrieval
— Unverified 0Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding Nov 30, 2023 Form Video Retrieval
— Unverified 0