Comprehensive Information Integration Modeling Framework for Video Titling Jun 24, 2020 Descriptive Video Captioning
Code Code Available 1AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding Jun 19, 2024 Question Answering Spatial Reasoning
Code Code Available 1Narrative Action Evaluation with Prompt-Guided Multimodal Interaction Apr 22, 2024 Action Quality Assessment multimodal interaction
Code Code Available 1Poet: Product-oriented Video Captioner for E-commerce Aug 16, 2020 Video Captioning
Code Code Available 1Improved Actor Relation Graph based Group Activity Recognition Oct 24, 2020 Activity Recognition Group Activity Recognition
Code Code Available 1Movie101: A New Movie Understanding Benchmark May 20, 2023 Video Captioning
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Apr 17, 2022 Navigate Retrieval
Code Code Available 1LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning Jun 17, 2023 Boundary Captioning Language Modeling
Code Code Available 1Learning to Generate Grounded Visual Captions without Localization Supervision Jun 1, 2019 Image Captioning Language Modelling
Code Code Available 1Learning to Generate Grounded Visual Captions without Localization Supervision Aug 1, 2020 Image Captioning Language Modelling
Code Code Available 1MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning Aug 25, 2023 Image Captioning Video Captioning
Code Code Available 1The MSR-Video to Text Dataset with Clean Annotations Feb 12, 2021 Sentence Video Captioning
Code Code Available 1ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation Mar 11, 2023 Image Captioning Image to text
Code Code Available 1COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 1EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Nov 17, 2021 Language Modelling Video Captioning
Code Code Available 1LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Jun 14, 2022 Decoder Language Modeling
Code Code Available 1Improving Generation and Evaluation of Visual Stories via Semantic Consistency May 20, 2021 Image Generation Story Visualization
Code Code Available 1Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos Mar 11, 2023 Dense Video Captioning Natural Language Moment Retrieval
Code Code Available 1Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 1Accurate and Fast Compressed Video Captioning Sep 22, 2023 Video Captioning
Code Code Available 1Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 1Hierarchical Modular Network for Video Captioning Nov 24, 2021 Representation Learning Sentence
Code Code Available 1HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 1Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 1Multi-modal Dense Video Captioning Mar 17, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Dec 16, 2023 Video Captioning video narration captioning
Code Code Available 1From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 1VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation Feb 18, 2025 Text-to-Video Generation Video Captioning
Code Code Available 1Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 1Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation Aug 17, 2016 Caption Generation Decoder
Code Code Available 1IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 1Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization Oct 9, 2024 Audio captioning Large Language Model
— Unverified 0Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning Feb 19, 2025 Knowledge Distillation Object
— Unverified 0End-to-end Generative Pretraining for Multimodal Video Captioning Jan 20, 2022 Action Classification Decoder
— Unverified 0A Restricted Visual Turing Test for Deep Scene and Event Understanding Dec 6, 2015 Question Answering Video Captioning
— Unverified 0Prediction and Description of Near-Future Activities in Video Aug 2, 2019 Prediction Video Captioning
— Unverified 0End-to-end Dense Video Captioning as Sequence Generation Apr 18, 2022 Dense Video Captioning Descriptive
— Unverified 0End-to-end Dense Video Captioning as Sequence Generation Jan 16, 2022 Dense Video Captioning Descriptive
— Unverified 0Adaptive Feature Abstraction for Translating Video to Text Nov 23, 2016 Video Captioning
— Unverified 0Hierarchical Multimodal Transformer to Summarize Videos Sep 22, 2021 Machine Translation Supervised Video Summarization
— Unverified 0End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering Oct 10, 2016 Language Modeling Language Modelling
— Unverified 0Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning Oct 2, 2023 Decoder Sentence
— Unverified 0Empirical Autopsy of Deep Video Captioning Frameworks Nov 21, 2019 Decoder Language Modelling
— Unverified 0FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 0E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information Nov 16, 2021 Caption Generation valid
— Unverified 0Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning Jun 5, 2017 Caption Generation Decoder
— Unverified 0Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives Dec 14, 2024 Descriptive Language Modeling
— Unverified 0AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Jan 1, 2025 GPU Question Answering
— Unverified 0Hierarchical memory decoder for visual narrating Sep 1, 2020 Decoder Image Captioning
— Unverified 0