E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer Nov 28, 2023 Language Modeling Language Modelling
— Unverified 0Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning Nov 27, 2023 Action Classification Action Recognition
Code Code Available 1VideoCon: Robust Video-Language Alignment via Contrast Captions Nov 15, 2023 Language Modeling Language Modelling
Code Code Available 1Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval Nov 14, 2023 Retrieval Video Retrieval
— Unverified 0Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval Nov 3, 2023 Recommendation Systems Retrieval
— Unverified 0An Empirical Study of Frame Selection for Text-to-Video Retrieval Nov 1, 2023 Retrieval Text to Video Retrieval
— Unverified 0CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing Oct 29, 2023 Contrastive Learning Retrieval
— Unverified 0TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding Oct 29, 2023 Form Language Modelling
Code Code Available 1Joint Searching and Grounding: Multi-Granularity Video Content Retrieval Oct 23, 2023 Contrastive Learning Retrieval
Code Code Available 0Videoprompter: an ensemble of foundational models for zero-shot video understanding Oct 23, 2023 Action Recognition Descriptive
— Unverified 0Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval Oct 12, 2023 Retrieval Semantic Retrieval
— Unverified 0Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data Oct 8, 2023 Action Recognition Continual Learning
Code Code Available 1GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval Oct 8, 2023 Partially Relevant Video Retrieval Retrieval
Code Code Available 1Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks Oct 7, 2023 Action Recognition Multiple-choice
Code Code Available 0HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 1Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval Sep 29, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 1Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning Sep 20, 2023 Contrastive Learning Retrieval
Code Code Available 1Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval Sep 20, 2023 Retrieval Video Retrieval
— Unverified 0Unified Coarse-to-Fine Alignment for Video-Text Retrieval Sep 18, 2023 Retrieval Text Retrieval
Code Code Available 1Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention Sep 17, 2023 Action Recognition Graph Generation
— Unverified 0In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval Sep 16, 2023 Retrieval Style Transfer
Code Code Available 1Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval Sep 15, 2023 Retrieval Video Classification
Code Code Available 0Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains Sep 1, 2023 Change Point Detection Instruction Following
Code Code Available 0CoVR-2: Automatic Data Construction for Composed Video Retrieval Aug 28, 2023 Composed Image Retrieval (CoIR) Composed Video Retrieval (CoVR)
Code Code Available 1Simple Baselines for Interactive Video Retrieval with Questions and Answers Aug 21, 2023 Question Answering Retrieval
Code Code Available 1Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval Aug 15, 2023 Retrieval Video Captioning
Code Code Available 1TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval Aug 2, 2023 Retrieval text similarity
— Unverified 0Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 1Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment Jul 24, 2023 Retrieval Text to Video Retrieval
— Unverified 0Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model Jul 24, 2023 Anomaly Detection Retrieval
Code Code Available 1Fine-grained Text-Video Retrieval with Frozen Image Encoders Jul 14, 2023 Decoder Retrieval
— Unverified 0Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation Jul 13, 2023 Retrieval Video Generation
Code Code Available 2InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Jul 13, 2023 Action Recognition Contrastive Learning
Code Code Available 0MultiVENT: Multilingual Videos of Events with Aligned Natural Text Jul 6, 2023 Information Retrieval Retrieval
— Unverified 0ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models Jun 28, 2023 Retrieval Video Retrieval
Code Code Available 0An overview on the evaluated video retrieval tasks at TRECVID 2022 Jun 22, 2023 Ad-hoc video search Retrieval
Code Code Available 1Key Frame Extraction with Attention Based Deep Neural Networks Jun 21, 2023 Video Retrieval Video Summarization
— Unverified 0MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 0COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 1Enhanced Multimodal Representation Learning with Cross-modal KD Jun 13, 2023 Contrastive Learning Emotion Classification
— Unverified 0MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding Jun 7, 2023 Retrieval Sentence
— Unverified 0An Overview of Challenges in Egocentric Text-Video Retrieval Jun 7, 2023 Retrieval Video Retrieval
— Unverified 0fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs May 31, 2023 Action Recognition Autonomous Vehicles
— Unverified 0VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 2FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition May 29, 2023 Action Recognition Autonomous Vehicles
— Unverified 0VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending May 22, 2023 Question Answering Retrieval
— Unverified 0Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment May 20, 2023 Retrieval Video Retrieval
Code Code Available 1Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval May 13, 2023 Retrieval Text Retrieval
— Unverified 0A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension May 5, 2023 Reading Comprehension Retrieval
Code Code Available 1A Review of Deep Learning for Video Captioning Apr 22, 2023 Deep Learning Dense Video Captioning
— Unverified 0