Delving Deeper into the Decoder for Video Captioning Jan 16, 2020 Decoder Sentence
Code Code Available 1Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization May 31, 2024 Sentence Video Captioning
Code Code Available 1SoccerNet 2023 Challenges Results Sep 12, 2023 Action Spotting Camera Calibration
Code Code Available 1An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 1Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval Aug 15, 2023 Retrieval Video Captioning
Code Code Available 1Syntax-Aware Action Targeting for Video Captioning Jun 1, 2020 Video Captioning
Code Code Available 1COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 1Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language Nov 18, 2020 Dictionary Learning Disentanglement
Code Code Available 1Narrative Action Evaluation with Prompt-Guided Multimodal Interaction Apr 22, 2024 Action Quality Assessment multimodal interaction
Code Code Available 1A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading Rules Jan 8, 2021 Decoder Deep Reinforcement Learning
Code Code Available 1PaLI-X: On Scaling up a Multilingual Vision and Language Model May 29, 2023 Chart Question Answering document understanding
Code Code Available 1MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning Aug 25, 2023 Image Captioning Video Captioning
Code Code Available 1A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos May 2, 2020 Action Detection Form
Code Code Available 1Multi-modal Dense Video Captioning Mar 17, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning May 11, 2020 Sentence Video Captioning
Code Code Available 1LVCHAT: Facilitating Long Video Comprehension Feb 19, 2024 Video Captioning
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1Learning to Discretely Compose Reasoning Module Networks for Video Captioning Jul 17, 2020 Decoder Question Answering
Code Code Available 1Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 1Learning to Generate Grounded Visual Captions without Localization Supervision Aug 1, 2020 Image Captioning Language Modelling
Code Code Available 1Movie101: A New Movie Understanding Benchmark May 20, 2023 Video Captioning
Code Code Available 1Partially Relevant Video Retrieval Aug 26, 2022 Moment Retrieval Multiple Instance Learning
Code Code Available 1IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 1Improving Generation and Evaluation of Visual Stories via Semantic Consistency May 20, 2021 Image Generation Story Visualization
Code Code Available 1Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 1Controllable Video Captioning with an Exemplar Sentence Dec 2, 2021 Caption Generation Decoder
Code Code Available 1Action knowledge for video captioning with graph neural networks Mar 16, 2023 Action Recognition Graph Neural Network
Code Code Available 1COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Nov 1, 2020 Cross-Modal Retrieval Representation Learning
Code Code Available 1COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 1Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks Nov 14, 2021 Action Classification Object
Code Code Available 1Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners May 22, 2022 Attribute Automatic Speech Recognition
Code Code Available 1LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning Jun 17, 2023 Boundary Captioning Language Modeling
Code Code Available 1HiCM^2: Hierarchical Compact Memory Modeling for Dense Video Captioning Dec 19, 2024 Dense Video Captioning Video Captioning
Code Code Available 1Discriminative Latent Semantic Graph for Video Captioning Aug 8, 2021 Decoder Object
Code Code Available 1Hierarchical Modular Network for Video Captioning Nov 24, 2021 Representation Learning Sentence
Code Code Available 1Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data Jan 16, 2024 Image Generation Text to Image Generation
Code Code Available 1MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Apr 17, 2022 Navigate Retrieval
Code Code Available 1DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization Jun 1, 2021 Question Answering Retrieval
Code Code Available 1EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Nov 17, 2021 Language Modelling Video Captioning
Code Code Available 1Multimodal Pretraining for Dense Video Captioning Nov 10, 2020 Dense Video Captioning Video Captioning
Code Code Available 1Comprehensive Information Integration Modeling Framework for Video Titling Jun 24, 2020 Descriptive Video Captioning
Code Code Available 1OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation Aug 8, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding Jun 19, 2024 Question Answering Spatial Reasoning
Code Code Available 1Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 1End-to-End Dense Video Captioning with Parallel Decoding Aug 17, 2021 Caption Generation Dense Video Captioning
Code Code Available 1Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis Apr 12, 2024 Dense Video Captioning Transfer Learning
Code Code Available 1A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer May 17, 2020 Dense Video Captioning Temporal Action Proposal Generation
Code Code Available 1LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Jun 14, 2022 Decoder Language Modeling
Code Code Available 1