Temporal Perceiving Video-Language Pre-training Jan 18, 2023 Action Localization Contrastive Learning
— Unverified 0Text with Knowledge Graph Augmented Transformer for Video Captioning Mar 22, 2023 Video Captioning
— Unverified 0The 8th AI City Challenge Apr 15, 2024 Dense Video Captioning Video Captioning
— Unverified 0The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning Mar 31, 2025 Video Captioning
— Unverified 0The Use of Video Captioning for Fostering Physical Activity Apr 7, 2021 Action Detection object-detection
— Unverified 0TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation Apr 24, 2025 Caption Generation Dense Video Captioning
— Unverified 0Title Generation for User Generated Videos Aug 25, 2016 Sentence Video Captioning
— Unverified 0Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning Jun 19, 2021 Sentence Video Captioning
— Unverified 0Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset Jun 19, 2024 Language Modeling Language Modelling
— Unverified 0Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications Oct 27, 2020 speech-recognition Speech Recognition
— Unverified 0TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval Sep 21, 2020 Action Detection Activity Detection
— Unverified 0Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges Sep 25, 2023 Anomaly Detection Dense Video Captioning
— Unverified 0Understanding Action Sequences based on Video Captioning for Learning-from-Observation Dec 9, 2020 Video Captioning Video Understanding
— Unverified 0Variational Stacked Local Attention Networks for Diverse Video Captioning Jan 4, 2022 Decoder Diversity
— Unverified 0VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning Oct 13, 2019 Video Captioning
— Unverified 0Vector Learning for Cross Domain Representations Sep 27, 2018 Decoder Image Captioning
— Unverified 0VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Jun 10, 2025 Multiple-choice Open-Ended Question Answering
— Unverified 0ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation Dec 12, 2024 Phrase Grounding Question Answering
— Unverified 0Video Captioning: a comparative review of where we are and which could be the route Apr 12, 2022 Video Captioning
— Unverified 0Video Captioning in Compressed Video Jan 2, 2021 Caption Generation Video Captioning
— Unverified 0Video Captioning Using Weak Annotation Sep 2, 2020 Sentence Video Captioning
— Unverified 0Video Captioning via Hierarchical Reinforcement Learning Nov 29, 2017 Hierarchical Reinforcement Learning reinforcement-learning
— Unverified 0Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion Aug 13, 2023 Video Captioning
— Unverified 0Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction Jul 8, 2018 Decoder Language Modeling
— Unverified 0Video Captioning with Guidance of Multimodal Latent Topics Aug 31, 2017 Caption Generation Decoder
— Unverified 0Video Captioning with Multi-Faceted Attention Dec 1, 2016 Information Retrieval Retrieval
— Unverified 0Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning Nov 5, 2019 Sentence Video Captioning
— Unverified 0Video Captioning with Transferred Semantic Attributes Nov 23, 2016 Sentence Video Captioning
— Unverified 0Video LLMs for Temporal Reasoning in Long Videos Dec 4, 2024 Action Segmentation Dense Video Captioning
— Unverified 0VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation May 4, 2023 Decoder Question Answering
— Unverified 0Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks Oct 26, 2015 Sentence Video Captioning
— Unverified 0VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners Dec 9, 2022 Question Answering Retrieval
— Unverified 0Vision and Language: from Visual Perception to Content Creation Dec 26, 2019 Decoder Question Answering
— Unverified 0Visual-aware Attention Dual-stream Decoder for Video Captioning Oct 16, 2021 Decoder Video Captioning
— Unverified 0VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending May 22, 2023 Question Answering Retrieval
— Unverified 0Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding Mar 14, 2025 Denoising Dense Video Captioning
— Unverified 0Watch It Twice: Video Captioning with a Refocused Video Encoder Jul 21, 2019 Video Captioning
— Unverified 0Weakly Supervised Dense Video Captioning Apr 5, 2017 Dense Video Captioning Language Modeling
— Unverified 0Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching May 18, 2021 Caption Generation Cross-Modal Retrieval
— Unverified 0Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning Nov 22, 2024 Dense Video Captioning Video Captioning
— Unverified 0Wolf: Captioning Everything with a World Summarization Framework Jul 26, 2024 Autonomous Driving Mixture-of-Experts
— Unverified 0Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment Jul 5, 2023 Dense Video Captioning Language Modelling
— Unverified 0Dual-path Collaborative Generation Network for Emotional Video Captioning Aug 6, 2024 Caption Generation Video Captioning
Code Code Available 0Reconstruction Network for Video Captioning Mar 30, 2018 Decoder Sentence
Code Code Available 0Live Video Captioning Jun 20, 2024 Dense Video Captioning Live Video Captioning
Code Code Available 0Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos Apr 28, 2022 Action Understanding Video Captioning
Code Code Available 0Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning Nov 28, 2022 FAD Video Captioning
Code Code Available 0Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning Apr 15, 2018 Video Captioning Video Understanding
Code Code Available 0Translating Videos to Natural Language Using Deep Recurrent Neural Networks Dec 15, 2014 Sentence Text Generation
Code Code Available 0Cross-Modal Graph with Meta Concepts for Video Captioning Aug 14, 2021 object-detection Object Detection
Code Code Available 0