MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning Feb 27, 2024 class-incremental learning Class Incremental Learning
— Unverified 0Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark Jan 25, 2024 Decoder Video Captioning
— Unverified 0SnapCap: Efficient Snapshot Compressive Video Captioning Jan 10, 2024 Compressive Sensing Video Captioning
— Unverified 0On Scaling Up a Multilingual Vision and Language Model Jan 1, 2024 document understanding In-Context Learning
— Unverified 0Retrieval-Augmented Egocentric Video Captioning Jan 1, 2024 Representation Learning Retrieval
— Unverified 0Set Prediction Guided by Semantic Concepts for Diverse Video Captioning Dec 25, 2023 Caption Generation Diversity
— Unverified 0A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Dec 25, 2023 Image Generation Text to Image Generation
— Unverified 0SOVC: Subject-Oriented Video Captioning Dec 20, 2023 Video Captioning
— Unverified 0Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023) Dec 12, 2023 Decoder Video Captioning
— Unverified 0Video Summarization: Towards Entity-Aware Captions Dec 1, 2023 Image Captioning Video Captioning
Code Code Available 0Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos Nov 28, 2023 Dense Video Captioning Transfer Learning
— Unverified 0Incorporating granularity bias as the margin into contrastive loss for video captioning Nov 25, 2023 Contrastive Learning Sentence
— Unverified 0Nepali Video Captioning using CNN-RNN Architecture Nov 5, 2023 Video Captioning
— Unverified 0Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols Nov 5, 2023 Caption Generation Dense Video Captioning
— Unverified 0Learning Interactive Real-World Simulators Oct 9, 2023 Video Captioning
— Unverified 0Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks Oct 7, 2023 Action Recognition Multiple-choice
— Unverified 0IcoCap: Improving Video Captioning by Compounding Images Oct 5, 2023 Image Captioning Video Captioning
— Unverified 0Human-centric Behavior Description in Videos: New Benchmark and Model Oct 4, 2023 Video Captioning
— Unverified 0Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning Oct 2, 2023 Decoder Sentence
— Unverified 0Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges Sep 25, 2023 Anomaly Detection Dense Video Captioning
— Unverified 0Collaborative Three-Stream Transformers for Video Captioning Sep 18, 2023 Sentence Video Captioning
— Unverified 0Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion Aug 13, 2023 Video Captioning
— Unverified 0Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment Jul 5, 2023 Dense Video Captioning Language Modelling
— Unverified 0Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos Jun 27, 2023 Multi-Task Learning Scene Understanding
— Unverified 0Exploring the Role of Audio in Video Captioning Jun 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 0Knowledge Distillation for Efficient Audio-Visual Video Captioning Jun 16, 2023 Audio-Visual Video Captioning Caption Generation
— Unverified 0VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending May 22, 2023 Question Answering Retrieval
— Unverified 0Edit As You Wish: Video Caption Editing with Multi-grained User Control May 15, 2023 Attribute Position
Code Code Available 0VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation May 4, 2023 Decoder Question Answering
— Unverified 0Visual Transformation Telling May 3, 2023 Dense Video Captioning Video Captioning
Code Code Available 0TCR: Short Video Title Generation and Cover Selection with Attention Refinement Apr 25, 2023 Video Captioning
— Unverified 0A Review of Deep Learning for Video Captioning Apr 22, 2023 Deep Learning Dense Video Captioning
— Unverified 0LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision Apr 15, 2023 Language Modeling Language Modelling
— Unverified 0Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data Apr 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Mar 29, 2023 Cross-Modal Retrieval Decoder
Code Code Available 0SEM-POS: Grammatically and Semantically Correct Video Captioning Mar 26, 2023 POS Video Captioning
— Unverified 0Text with Knowledge Graph Augmented Transformer for Video Captioning Mar 22, 2023 Video Captioning
— Unverified 0Implicit and Explicit Commonsense for Multi-sentence Video Captioning Mar 14, 2023 Imitation Learning Sentence
— Unverified 0Accommodating Audio Modality in CLIP for Multimodal Processing Mar 12, 2023 AudioCaps Contrastive Learning
Code Code Available 0Models See Hallucinations: Evaluating the Factuality in Video Captioning Mar 6, 2023 Text Generation Video Captioning
— Unverified 0STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training Feb 20, 2023 Language Modelling Object
— Unverified 0Temporal Perceiving Video-Language Pre-training Jan 18, 2023 Action Localization Contrastive Learning
— Unverified 0Exploiting Auxiliary Caption for Video Grounding Jan 15, 2023 Contrastive Learning Dense Video Captioning
— Unverified 0HiVLP: Hierarchical Interactive Video-Language Pre-Training Jan 1, 2023 Retrieval Self-Supervised Learning
— Unverified 0Exploring Group Video Captioning with Efficient Relational Approximation Jan 1, 2023 Video Captioning
— Unverified 0ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded Jan 1, 2023 Action Classification Action Recognition
— Unverified 0HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Dec 30, 2022 cross-modal alignment TGIF-Action
— Unverified 0METEOR Guided Divergence for Video Captioning Dec 20, 2022 Hierarchical Reinforcement Learning Scene Understanding
Code Code Available 0Contextual Explainable Video Representation: Human Perception-based Understanding Dec 12, 2022 Action Detection Action Recognition
Code Code Available 0