IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 1Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 1COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Nov 1, 2020 Cross-Modal Retrieval Representation Learning
Code Code Available 1COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 1Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks Nov 14, 2021 Action Classification Object
Code Code Available 1Improving Generation and Evaluation of Visual Stories via Semantic Consistency May 20, 2021 Image Generation Story Visualization
Code Code Available 1Learning to Generate Grounded Visual Captions without Localization Supervision Aug 1, 2020 Image Captioning Language Modelling
Code Code Available 1HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training May 1, 2020 Language Modeling Language Modelling
Code Code Available 1Discriminative Latent Semantic Graph for Video Captioning Aug 8, 2021 Decoder Object
Code Code Available 1A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos May 2, 2020 Action Detection Form
Code Code Available 1EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Nov 17, 2021 Language Modelling Video Captioning
Code Code Available 1HiCM^2: Hierarchical Compact Memory Modeling for Dense Video Captioning Dec 19, 2024 Dense Video Captioning Video Captioning
Code Code Available 1Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data Jan 16, 2024 Image Generation Text to Image Generation
Code Code Available 1Comprehensive Information Integration Modeling Framework for Video Titling Jun 24, 2020 Descriptive Video Captioning
Code Code Available 1Multi-modal Dense Video Captioning Mar 17, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Narrative Action Evaluation with Prompt-Guided Multimodal Interaction Apr 22, 2024 Action Quality Assessment multimodal interaction
Code Code Available 1Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language Nov 18, 2020 Dictionary Learning Disentanglement
Code Code Available 1AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding Jun 19, 2024 Question Answering Spatial Reasoning
Code Code Available 1Hierarchical Modular Network for Video Captioning Nov 24, 2021 Representation Learning Sentence
Code Code Available 1End-to-End Dense Video Captioning with Parallel Decoding Aug 17, 2021 Caption Generation Dense Video Captioning
Code Code Available 1Poet: Product-oriented Video Captioner for E-commerce Aug 16, 2020 Video Captioning
Code Code Available 1A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer May 17, 2020 Dense Video Captioning Temporal Action Proposal Generation
Code Code Available 1Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners May 22, 2022 Attribute Automatic Speech Recognition
Code Code Available 1