Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners May 22, 2022 Attribute Automatic Speech Recognition
Code Code Available 15 Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 15 Delving Deeper into the Decoder for Video Captioning Jan 16, 2020 Decoder Sentence
Code Code Available 15 LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Jun 14, 2022 Decoder Language Modeling
Code Code Available 15 An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 15 The MSR-Video to Text Dataset with Clean Annotations Feb 12, 2021 Sentence Video Captioning
Code Code Available 15 COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 15 Learning to Discretely Compose Reasoning Module Networks for Video Captioning Jul 17, 2020 Decoder Question Answering
Code Code Available 15 Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization May 31, 2024 Sentence Video Captioning
Code Code Available 15 A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading Rules Jan 8, 2021 Decoder Deep Reinforcement Learning
Code Code Available 15 HiCM^2: Hierarchical Compact Memory Modeling for Dense Video Captioning Dec 19, 2024 Dense Video Captioning Video Captioning
Code Code Available 15 Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Dec 16, 2023 Video Captioning video narration captioning
Code Code Available 15 GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation Mar 26, 2023 Video Captioning
Code Code Available 15 A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos May 2, 2020 Action Detection Form
Code Code Available 15 Semantic Grouping Network for Video Captioning Feb 1, 2021 Video Captioning
Code Code Available 15 SoccerNet 2023 Challenges Results Sep 12, 2023 Action Spotting Camera Calibration
Code Code Available 15 Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval Aug 15, 2023 Retrieval Video Captioning
Code Code Available 15 Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Mar 21, 2023 Contrastive Learning Image Captioning
Code Code Available 15 Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches Jun 30, 2022 Caption Generation Video Captioning
Code Code Available 15 COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 15 Action knowledge for video captioning with graph neural networks Mar 16, 2023 Action Recognition Graph Neural Network
Code Code Available 15 Fine-grained Audible Video Description Mar 27, 2023 Language Modeling Language Modelling
Code Code Available 15 Hierarchical Modular Network for Video Captioning Nov 24, 2021 Representation Learning Sentence
Code Code Available 15 RTQ: Rethinking Video-language Understanding Based on Image-text Model Dec 1, 2023 Video Captioning Video Question Answering
Code Code Available 15 Controllable Video Captioning with an Exemplar Sentence Dec 2, 2021 Caption Generation Decoder
Code Code Available 15 Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis Apr 12, 2024 Dense Video Captioning Transfer Learning
Code Code Available 15 Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language Nov 18, 2020 Dictionary Learning Disentanglement
Code Code Available 15 Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 15 OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation Aug 8, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Nov 1, 2020 Cross-Modal Retrieval Representation Learning
Code Code Available 15 PaLI-X: On Scaling up a Multilingual Vision and Language Model May 29, 2023 Chart Question Answering document understanding
Code Code Available 15 Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks Nov 14, 2021 Action Classification Object
Code Code Available 15 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 15 Poet: Product-oriented Video Captioner for E-commerce Aug 16, 2020 Video Captioning
Code Code Available 15 Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation Aug 17, 2016 Caption Generation Decoder
Code Code Available 15 From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 15 Multi-modal Dense Video Captioning Mar 17, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations Nov 21, 2022 Contrastive Learning Representation Learning
Code Code Available 15 GL-RG: Global-Local Representation Granularity for Video Captioning May 22, 2022 Caption Generation Descriptive
Code Code Available 15 DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization Jun 1, 2021 Question Answering Retrieval
Code Code Available 15 Multimodal Pretraining for Dense Video Captioning Nov 10, 2020 Dense Video Captioning Video Captioning
Code Code Available 15 Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data Jan 16, 2024 Image Generation Text to Image Generation
Code Code Available 15 G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Dec 18, 2024 Image Captioning Video Captioning
Code Code Available 15 HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training May 1, 2020 Language Modeling Language Modelling
Code Code Available 15 Comprehensive Information Integration Modeling Framework for Video Titling Jun 24, 2020 Descriptive Video Captioning
Code Code Available 15 AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding Jun 19, 2024 Question Answering Spatial Reasoning
Code Code Available 15 HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 15 Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 15 A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer May 17, 2020 Dense Video Captioning Temporal Action Proposal Generation
Code Code Available 15 Narrative Action Evaluation with Prompt-Guided Multimodal Interaction Apr 22, 2024 Action Quality Assessment multimodal interaction
Code Code Available 15