IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 15 VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning Nov 28, 2022 Diversity Sentence
Code Code Available 15 Poet: Product-oriented Video Captioner for E-commerce Aug 16, 2020 Video Captioning
Code Code Available 15 Discriminative Latent Semantic Graph for Video Captioning Aug 8, 2021 Decoder Object
Code Code Available 15 RTQ: Rethinking Video-language Understanding Based on Image-text Model Dec 1, 2023 Video Captioning Video Question Answering
Code Code Available 15 Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Mar 21, 2023 Contrastive Learning Image Captioning
Code Code Available 15 HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 15 TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks Nov 23, 2020 Action Classification Action Localization
Code Code Available 15 Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 15 ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation Mar 11, 2023 Image Captioning Image to text
Code Code Available 15 HiCM^2: Hierarchical Compact Memory Modeling for Dense Video Captioning Dec 19, 2024 Dense Video Captioning Video Captioning
Code Code Available 15 Semantic Grouping Network for Video Captioning Feb 1, 2021 Video Captioning
Code Code Available 15 The MSR-Video to Text Dataset with Clean Annotations Feb 12, 2021 Sentence Video Captioning
Code Code Available 15 Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners May 22, 2022 Attribute Automatic Speech Recognition
Code Code Available 15 COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 15 EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Nov 17, 2021 Language Modelling Video Captioning
Code Code Available 15 Improving Generation and Evaluation of Visual Stories via Semantic Consistency May 20, 2021 Image Generation Story Visualization
Code Code Available 15 UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks Jul 15, 2025 Video Captioning Video Understanding
Code Code Available 15 GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation Mar 26, 2023 Video Captioning
Code Code Available 15 Syntax-Aware Action Targeting for Video Captioning Jun 1, 2020 Video Captioning
Code Code Available 15 Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation Aug 17, 2016 Caption Generation Decoder
Code Code Available 15 A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading Rules Jan 8, 2021 Decoder Deep Reinforcement Learning
Code Code Available 15 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 15 Hierarchical Modular Network for Video Captioning Nov 24, 2021 Representation Learning Sentence
Code Code Available 15 Accurate and Fast Compressed Video Captioning Sep 22, 2023 Video Captioning
Code Code Available 15 Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis Apr 12, 2024 Dense Video Captioning Transfer Learning
Code Code Available 15 G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Dec 18, 2024 Image Captioning Video Captioning
Code Code Available 15 Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language Nov 18, 2020 Dictionary Learning Disentanglement
Code Code Available 15 Tell me what you see: A zero-shot action recognition method based on natural language descriptions Dec 18, 2021 Action Recognition Descriptive
Code Code Available 15 From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 15 Unifying Event Detection and Captioning as Sequence Generation via Pre-Training Jul 18, 2022 Dense Video Captioning Event Detection
Code Code Available 15 VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning Jun 26, 2022 Contrastive Learning Diversity
Code Code Available 15 End-to-End Video Captioning with Multitask Reinforcement Learning Mar 21, 2018 GPU reinforcement-learning
Code Code Available 05 SoccerNet 2024 Challenges Results Sep 16, 2024 Action Spotting Dense Video Captioning
Code Code Available 05 StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation Sep 13, 2022 Image Generation Story Continuation
Code Code Available 05 End-to-End Dense Video Captioning with Masked Transformer Apr 3, 2018 Decoder Dense Video Captioning
Code Code Available 05 Sketch, Ground, and Refine: Top-Down Dense Video Captioning Jun 19, 2021 Dense Video Captioning Sentence
Code Code Available 05 Streamlined Dense Video Captioning Apr 8, 2019 Dense Video Captioning Reinforcement Learning
Code Code Available 05 Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention Sep 7, 2021 Sensor Fusion Video Captioning
Code Code Available 05 Event and Entity Extraction from Generated Video Captions Nov 5, 2022 Caption Generation Dense Video Captioning
Code Code Available 05 Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos Jul 30, 2024 Semantic Role Labeling Video Captioning
Code Code Available 05 Video captioning with stacked attention and semantic hard pull Sep 15, 2020 Decoder Video Captioning
Code Code Available 05 Edit As You Wish: Video Caption Editing with Multi-grained User Control May 15, 2023 Attribute Position
Code Code Available 05 ECO: Efficient Convolutional Network for Online Video Understanding Apr 24, 2018 Action Classification Action Recognition
Code Code Available 05 Support-set based Multi-modal Representation Enhancement for Video Captioning May 19, 2022 Video Captioning
Code Code Available 05 Reconstruction Network for Video Captioning Mar 30, 2018 Decoder Sentence
Code Code Available 05 Dual-Stream Transformer for Generic Event Boundary Captioning Jul 7, 2022 Boundary Captioning Video Captioning
Code Code Available 05 Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning Nov 28, 2022 FAD Video Captioning
Code Code Available 05 Accommodating Audio Modality in CLIP for Multimodal Processing Mar 12, 2023 AudioCaps Contrastive Learning
Code Code Available 05 Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning Nov 6, 2024 Video Captioning
Code Code Available 05