Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation Mar 8, 2024 Articles Hallucination
— Unverified 00 Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation Jul 12, 2020 Decoder Graph-to-Sequence
— Unverified 00 Spatio-Temporal Attention Models for Grounded Video Captioning Oct 17, 2016 image-classification Image Classification
— Unverified 00 Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning Feb 27, 2019 Attribute Caption Generation
— Unverified 00 Spatio-Temporal Graph for Video Captioning with Knowledge Distillation Mar 31, 2020 Knowledge Distillation Object
— Unverified 00 Spatio-Temporal Ranked-Attention Networks for Video Captioning Jan 17, 2020 Video Captioning
— Unverified 00 SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities Nov 4, 2024 Attribute Descriptive
— Unverified 00 STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training Feb 20, 2023 Language Modelling Object
— Unverified 00 Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding Mar 14, 2025 Denoising Dense Video Captioning
— Unverified 00 Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges Jun 4, 2024 Question Answering Story Generation
— Unverified 00 Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network Jun 2, 2016 Video Captioning Visual Storytelling
— Unverified 00 Streaming Dense Video Captioning Apr 1, 2024 Dense Video Captioning Live Video Captioning
— Unverified 00 Watch It Twice: Video Captioning with a Refocused Video Encoder Jul 21, 2019 Video Captioning
— Unverified 00 Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos Jun 27, 2023 Multi-Task Learning Scene Understanding
— Unverified 00 SOVC: Subject-Oriented Video Captioning Dec 20, 2023 Video Captioning
— Unverified 00 Supervising Neural Attention Models for Video Captioning by Human Gaze Data Jul 19, 2017 Descriptive Gaze Prediction
— Unverified 00 Active Learning for Video Description With Cluster-Regularized Ensemble Ranking Jul 27, 2020 Active Learning Video Captioning
— Unverified 00 CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising Dec 14, 2021 Cross-Modal Retrieval Decoder
— Unverified 00 CLIP4Caption: CLIP for Video Caption Oct 13, 2021 Decoder Sentence
— Unverified 00 Weakly Supervised Dense Video Captioning Apr 5, 2017 Dense Video Captioning Language Modeling
— Unverified 00 Classifier-Guided Captioning Across Modalities Jan 3, 2025 Audio captioning Video Captioning
— Unverified 00 Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description Jul 1, 2017 Video Captioning Video Description
— Unverified 00 TCR: Short Video Title Generation and Cover Selection with Attention Refinement Apr 25, 2023 Video Captioning
— Unverified 00 Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning Jun 14, 2020 Dense Captioning Dense Video Captioning
— Unverified 00 Technical Report for Soccernet 2023 -- Dense Video Captioning Oct 31, 2024 Dense Video Captioning Video Captioning
— Unverified 00 Chinese Whispers: Cooperative Paraphrase Acquisition May 1, 2012 Machine Translation Natural Language Inference
— Unverified 00 Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching May 18, 2021 Caption Generation Cross-Modal Retrieval
— Unverified 00 Temporally Grounding Natural Sentence in Video Oct 1, 2018 Sentence Video Captioning
— Unverified 00 Temporal Object Captioning for Street Scene Videos from LiDAR Tracks May 22, 2025 Caption Generation Video Captioning
— Unverified 00 Temporal Perceiving Video-Language Pre-training Jan 18, 2023 Action Localization Contrastive Learning
— Unverified 00 A Dataset for Telling the Stories of Social Media Videos Oct 1, 2018 Sentence Video Captioning
— Unverified 00 Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence models Nov 22, 2019 Decoder Video Captioning
— Unverified 00 Text with Knowledge Graph Augmented Transformer for Video Captioning Mar 22, 2023 Video Captioning
— Unverified 00 The 8th AI City Challenge Apr 15, 2024 Dense Video Captioning Video Captioning
— Unverified 00 The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning Mar 31, 2025 Video Captioning
— Unverified 00 The Use of Video Captioning for Fostering Physical Activity Apr 7, 2021 Action Detection object-detection
— Unverified 00 Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning Feb 19, 2025 Knowledge Distillation Object
— Unverified 00 TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation Apr 24, 2025 Caption Generation Dense Video Captioning
— Unverified 00 Title Generation for User Generated Videos Aug 25, 2016 Sentence Video Captioning
— Unverified 00 Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning Nov 22, 2024 Dense Video Captioning Video Captioning
— Unverified 00 Adaptive Feature Abstraction for Translating Video to Text Nov 23, 2016 Video Captioning
— Unverified 00 Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning Jun 19, 2021 Sentence Video Captioning
— Unverified 00 Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset Jun 19, 2024 Language Modeling Language Modelling
— Unverified 00 Prediction and Description of Near-Future Activities in Video Aug 2, 2019 Prediction Video Captioning
— Unverified 00 Wolf: Captioning Everything with a World Summarization Framework Jul 26, 2024 Autonomous Driving Mixture-of-Experts
— Unverified 00 Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications Oct 27, 2020 speech-recognition Speech Recognition
— Unverified 00 Translating Videos to Natural Language Using Deep Recurrent Neural Networks Dec 15, 2014 Sentence Text Generation
— Unverified 00 TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval Sep 21, 2020 Action Detection Activity Detection
— Unverified 00 FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 00 Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges Sep 25, 2023 Anomaly Detection Dense Video Captioning
— Unverified 00