Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment Jul 5, 2023 Dense Video Captioning Language Modelling
— Unverified 0Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data Nov 17, 2015 Image Captioning Novel Concepts
Code Code Available 0Live Video Captioning Jun 20, 2024 Dense Video Captioning Live Video Captioning
Code Code Available 0Video captioning with stacked attention and semantic hard pull Sep 15, 2020 Decoder Video Captioning
Code Code Available 0Event and Entity Extraction from Generated Video Captions Nov 5, 2022 Caption Generation Dense Video Captioning
Code Code Available 0Joint Event Detection and Description in Continuous Video Streams Feb 28, 2018 Dense Captioning Dense Video Captioning
Code Code Available 0Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Dec 17, 2024 Dense Video Captioning Descriptive
Code Code Available 0Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention Sep 7, 2021 Sensor Fusion Video Captioning
Code Code Available 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0FocusedAD: Character-centric Movie Audio Description Apr 16, 2025 Video Captioning
Code Code Available 0Screencast Tutorial Video Understanding Jun 1, 2020 object-detection Object Detection
Code Code Available 0Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning Nov 28, 2022 FAD Video Captioning
Code Code Available 0Reconstruction Network for Video Captioning Mar 30, 2018 Decoder Sentence
Code Code Available 0Video Summarization: Towards Entity-Aware Captions Dec 1, 2023 Image Captioning Video Captioning
Code Code Available 0Sketch, Ground, and Refine: Top-Down Dense Video Captioning Jun 19, 2021 Dense Video Captioning Sentence
Code Code Available 0Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning Nov 6, 2024 Video Captioning
Code Code Available 0FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks May 19, 2025 Video Captioning
Code Code Available 0FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework Apr 9, 2021 Language Modelling Multiple-choice
Code Code Available 0Pretrained Image-Text Models are Secretly Video Captioners Feb 19, 2025 Image Captioning Video Captioning
Code Code Available 0SoccerNet 2024 Challenges Results Sep 16, 2024 Action Spotting Dense Video Captioning
Code Code Available 0OSVidCap: A Framework for the Simultaneous Recognition and Description of Concurrent Actions in Videos in an Open-Set Scenario Sep 29, 2021 Decoder Open Set Video Captioning
Code Code Available 0Oracle performance for visual captioning Nov 14, 2015 Image Captioning Language Modeling
Code Code Available 0Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning Apr 15, 2018 Video Captioning Video Understanding
Code Code Available 0Cross-Modal Graph with Meta Concepts for Video Captioning Aug 14, 2021 object-detection Object Detection
Code Code Available 0A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks May 20, 2019 Machine Translation Translation
Code Code Available 0VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting Dec 16, 2024 Informativeness Large Language Model
Code Code Available 0OmniNet: A unified architecture for multi-modal multi-task learning Jul 17, 2019 Image Captioning Multi-Task Learning
Code Code Available 0Cross-Modal and Hierarchical Modeling of Video and Text Oct 16, 2018 Action Recognition Retrieval
Code Code Available 0Excitation Backprop for RNNs Nov 18, 2017 Action Recognition Temporal Action Localization
Code Code Available 0Enriching Video Captions With Contextual Text Jul 29, 2020 Video Captioning
Code Code Available 0StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation Sep 13, 2022 Image Generation Story Continuation
Code Code Available 0End-to-End Video Captioning with Multitask Reinforcement Learning Mar 21, 2018 GPU reinforcement-learning
Code Code Available 0End-to-End Dense Video Captioning with Masked Transformer Apr 3, 2018 Decoder Dense Video Captioning
Code Code Available 0Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos Jul 30, 2024 Semantic Role Labeling Video Captioning
Code Code Available 0Streamlined Dense Video Captioning Apr 8, 2019 Dense Video Captioning Reinforcement Learning
Code Code Available 0VideoBERT: A Joint Model for Video and Language Representation Learning Apr 3, 2019 Action Classification General Classification
Code Code Available 0ActBERT: Learning Global-Local Video-Text Representations Nov 14, 2020 Action Segmentation Question Answering
Code Code Available 0Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network Aug 27, 2019 Caption Generation Decoder
Code Code Available 0Support-set based Multi-modal Representation Enhancement for Video Captioning May 19, 2022 Video Captioning
Code Code Available 0Non-Autoregressive Coarse-to-Fine Video Captioning Nov 27, 2019 Sentence Video Captioning
Code Code Available 0M-VAD Names: a Dataset for Video Captioning with Naming Mar 4, 2019 TAG Video Captioning
Code Code Available 0Syntax Customized Video Captioning by Imitating Exemplar Sentences Dec 2, 2021 Decoder Diversity
Code Code Available 0Multi-attention Networks for Temporal Localization of Video-level Labels Nov 15, 2019 Action Recognition Temporal Action Localization
Code Code Available 0Visual Transformation Telling May 3, 2023 Dense Video Captioning Video Captioning
Code Code Available 0A Survey of Video Datasets for Grounded Event Understanding Jun 14, 2024 Common Sense Reasoning Event Extraction
Code Code Available 0Continual and Multi-Task Architecture Search Jun 12, 2019 Continual Learning General Classification
Code Code Available 0Accommodating Audio Modality in CLIP for Multimodal Processing Mar 12, 2023 AudioCaps Contrastive Learning
Code Code Available 0MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description Sep 19, 2018 Decoder Video Captioning
Code Code Available 0Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning May 3, 2019 Decoder Sentence
Code Code Available 0Contextual Explainable Video Representation: Human Perception-based Understanding Dec 12, 2022 Action Detection Action Recognition
Code Code Available 0