Movie101: A New Movie Understanding Benchmark May 20, 2023 Video Captioning
Code Code Available 1Edit As You Wish: Video Caption Editing with Multi-grained User Control May 15, 2023 Attribute Position
Code Code Available 0VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation May 4, 2023 Decoder Question Answering
— Unverified 0Visual Transformation Telling May 3, 2023 Dense Video Captioning Video Captioning
Code Code Available 0From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 1TCR: Short Video Title Generation and Cover Selection with Attention Refinement Apr 25, 2023 Video Captioning
— Unverified 0A Review of Deep Learning for Video Captioning Apr 22, 2023 Deep Learning Dense Video Captioning
— Unverified 0VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Apr 17, 2023 Audio captioning Audio-Video Question Answering (AVQA)
Code Code Available 2LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision Apr 15, 2023 Language Modeling Language Modelling
— Unverified 0SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries Apr 10, 2023 Dense Video Captioning Video Captioning
Code Code Available 2Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions Apr 9, 2023 Video Captioning
Code Code Available 2Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data Apr 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 1MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Mar 29, 2023 Cross-Modal Retrieval Decoder
Code Code Available 0Fine-grained Audible Video Description Mar 27, 2023 Language Modeling Language Modelling
Code Code Available 1SEM-POS: Grammatically and Semantically Correct Video Captioning Mar 26, 2023 POS Video Captioning
— Unverified 0GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation Mar 26, 2023 Video Captioning
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1Text with Knowledge Graph Augmented Transformer for Video Captioning Mar 22, 2023 Video Captioning
— Unverified 0Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Mar 21, 2023 Contrastive Learning Image Captioning
Code Code Available 1Action knowledge for video captioning with graph neural networks Mar 16, 2023 Action Recognition Graph Neural Network
Code Code Available 1Implicit and Explicit Commonsense for Multi-sentence Video Captioning Mar 14, 2023 Imitation Learning Sentence
— Unverified 0Accommodating Audio Modality in CLIP for Multimodal Processing Mar 12, 2023 AudioCaps Contrastive Learning
Code Code Available 0ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation Mar 11, 2023 Image Captioning Image to text
Code Code Available 1Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos Mar 11, 2023 Dense Video Captioning Natural Language Moment Retrieval
Code Code Available 1Models See Hallucinations: Evaluating the Factuality in Video Captioning Mar 6, 2023 Text Generation Video Captioning
— Unverified 0Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning Feb 27, 2023 Dense Video Captioning Language Modeling
Code Code Available 2STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training Feb 20, 2023 Language Modelling Object
— Unverified 0mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Feb 1, 2023 Action Classification Image Classification
Code Code Available 4Temporal Perceiving Video-Language Pre-training Jan 18, 2023 Action Localization Contrastive Learning
— Unverified 0Exploiting Auxiliary Caption for Video Grounding Jan 15, 2023 Contrastive Learning Dense Video Captioning
— Unverified 0HiVLP: Hierarchical Interactive Video-Language Pre-Training Jan 1, 2023 Retrieval Self-Supervised Learning
— Unverified 0ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded Jan 1, 2023 Action Classification Action Recognition
— Unverified 0Exploring Group Video Captioning with Efficient Relational Approximation Jan 1, 2023 Video Captioning
— Unverified 0Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? Dec 31, 2022 Data Augmentation Retrieval
Code Code Available 2HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Dec 30, 2022 cross-modal alignment TGIF-Action
— Unverified 0METEOR Guided Divergence for Video Captioning Dec 20, 2022 Hierarchical Reinforcement Learning Scene Understanding
Code Code Available 0Contextual Explainable Video Representation: Human Perception-based Understanding Dec 12, 2022 Action Detection Action Recognition
Code Code Available 0MAViC: Multimodal Active Learning for Video Captioning Dec 11, 2022 Active Learning Decoder
— Unverified 0VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners Dec 9, 2022 Question Answering Retrieval
— Unverified 0VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning Nov 28, 2022 Diversity Sentence
Code Code Available 1Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning Nov 28, 2022 FAD Video Captioning
Code Code Available 0Aligning Source Visual and Target Language Domains for Unpaired Video Captioning Nov 22, 2022 Translation Video Captioning
— Unverified 0Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations Nov 21, 2022 Contrastive Learning Representation Learning
Code Code Available 1Visual Commonsense-aware Representation Network for Video Captioning Nov 17, 2022 Caption Generation Question Answering
Code Code Available 1Event and Entity Extraction from Generated Video Captions Nov 5, 2022 Caption Generation Dense Video Captioning
Code Code Available 0Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality Nov 1, 2022 Data Augmentation Image Retrieval
Code Code Available 1Vision-Language Pre-training: Basics, Recent Advances, and Future Trends Oct 17, 2022 Few-Shot Learning Image Captioning
Code Code Available 3Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks Oct 10, 2022 Retrieval Text to Video Retrieval
— Unverified 0Thinking Hallucination for Video Captioning Sep 28, 2022 Hallucination Video Captioning
Code Code Available 1