E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer Nov 28, 2023 Language Modeling Language Modelling
— Unverified 0Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval Nov 14, 2023 Retrieval Video Retrieval
— Unverified 0Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval Nov 3, 2023 Recommendation Systems Retrieval
— Unverified 0An Empirical Study of Frame Selection for Text-to-Video Retrieval Nov 1, 2023 Retrieval Text to Video Retrieval
— Unverified 0CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing Oct 29, 2023 Contrastive Learning Retrieval
— Unverified 0Videoprompter: an ensemble of foundational models for zero-shot video understanding Oct 23, 2023 Action Recognition Descriptive
— Unverified 0Joint Searching and Grounding: Multi-Granularity Video Content Retrieval Oct 23, 2023 Contrastive Learning Retrieval
Code Code Available 0Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval Oct 12, 2023 Retrieval Semantic Retrieval
— Unverified 0Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks Oct 7, 2023 Action Recognition Multiple-choice
Code Code Available 0Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval Sep 20, 2023 Retrieval Video Retrieval
— Unverified 0Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention Sep 17, 2023 Action Recognition Graph Generation
— Unverified 0Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval Sep 15, 2023 Retrieval Video Classification
Code Code Available 0Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains Sep 1, 2023 Change Point Detection Instruction Following
Code Code Available 0TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval Aug 2, 2023 Retrieval text similarity
— Unverified 0Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment Jul 24, 2023 Retrieval Text to Video Retrieval
— Unverified 0Fine-grained Text-Video Retrieval with Frozen Image Encoders Jul 14, 2023 Decoder Retrieval
— Unverified 0InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Jul 13, 2023 Action Recognition Contrastive Learning
Code Code Available 0MultiVENT: Multilingual Videos of Events with Aligned Natural Text Jul 6, 2023 Information Retrieval Retrieval
— Unverified 0ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models Jun 28, 2023 Retrieval Video Retrieval
Code Code Available 0Key Frame Extraction with Attention Based Deep Neural Networks Jun 21, 2023 Video Retrieval Video Summarization
— Unverified 0MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 0Enhanced Multimodal Representation Learning with Cross-modal KD Jun 13, 2023 Contrastive Learning Emotion Classification
— Unverified 0An Overview of Challenges in Egocentric Text-Video Retrieval Jun 7, 2023 Retrieval Video Retrieval
— Unverified 0MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding Jun 7, 2023 Retrieval Sentence
— Unverified 0fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs May 31, 2023 Action Recognition Autonomous Vehicles
— Unverified 0FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition May 29, 2023 Action Recognition Autonomous Vehicles
— Unverified 0VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending May 22, 2023 Question Answering Retrieval
— Unverified 0Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval May 13, 2023 Retrieval Text Retrieval
— Unverified 0A Review of Deep Learning for Video Captioning Apr 22, 2023 Deep Learning Dense Video Captioning
— Unverified 0LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision Apr 15, 2023 Language Modeling Language Modelling
— Unverified 0Perfect Match in Video Retrieval Mar 29, 2023 Retrieval Video Retrieval
— Unverified 0Free-Form Multi-Modal Multimedia Retrieval (4MR) Mar 29, 2023 Form Management
— Unverified 0Unmasked Teacher: Towards Training-Efficient Video Foundation Models Mar 28, 2023 Action Classification Action Recognition
Code Code Available 0Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval Mar 28, 2023 Action Recognition Contrastive Learning
— Unverified 0Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding Mar 28, 2023 Action Localization Action Recognition
— Unverified 0Aligning Step-by-Step Instructional Diagrams to Video Demonstrations Mar 24, 2023 Contrastive Learning Image Retrieval
Code Code Available 0Dialogue-to-Video Retrieval Mar 23, 2023 Recommendation Systems Retrieval
Code Code Available 0Accommodating Audio Modality in CLIP for Multimodal Processing Mar 12, 2023 AudioCaps Contrastive Learning
Code Code Available 0MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling Mar 10, 2023 Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
— Unverified 0Improving Video Retrieval by Adaptive Margin Mar 9, 2023 Retrieval Video Retrieval
— Unverified 0STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training Feb 20, 2023 Language Modelling Object
— Unverified 0Video-Text Retrieval by Supervised Sparse Multi-Grained Learning Feb 19, 2023 Representation Learning Retrieval
Code Code Available 0Is Multimodal Vision Supervision Beneficial to Language? Feb 10, 2023 Image Retrieval Natural Language Understanding
Code Code Available 0Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer Feb 4, 2023 Computational Efficiency Question Answering
Code Code Available 0Zorro: the masked multimodal transformer Jan 23, 2023 Audio Tagging Multimodal Deep Learning
Code Code Available 0Temporal Perceiving Video-Language Pre-training Jan 18, 2023 Action Localization Contrastive Learning
— Unverified 0Learning Trajectory-Word Alignments for Video-Language Tasks Jan 5, 2023 Question Answering Retrieval
— Unverified 0HiVLP: Hierarchical Interactive Video-Language Pre-Training Jan 1, 2023 Retrieval Self-Supervised Learning
— Unverified 0PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval Jan 1, 2023 Representation Learning Retrieval
— Unverified 0Exploring Temporal Concurrency for Video-Language Representation Learning Jan 1, 2023 Dynamic Time Warping Metric Learning
Code Code Available 0