Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Dec 16, 2023 Video Captioning video narration captioning
Code Code Available 1RTQ: Rethinking Video-language Understanding Based on Image-text Model Dec 1, 2023 Video Captioning Video Question Answering
Code Code Available 1HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 1Accurate and Fast Compressed Video Captioning Sep 22, 2023 Video Captioning
Code Code Available 1SoccerNet 2023 Challenges Results Sep 12, 2023 Action Spotting Camera Calibration
Code Code Available 1MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning Aug 25, 2023 Image Captioning Video Captioning
Code Code Available 1VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Aug 18, 2023 Image Captioning Text Generation
Code Code Available 1Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval Aug 15, 2023 Retrieval Video Captioning
Code Code Available 1OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation Aug 8, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 1LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning Jun 17, 2023 Boundary Captioning Language Modeling
Code Code Available 1COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 1PaLI-X: On Scaling up a Multilingual Vision and Language Model May 29, 2023 Chart Question Answering document understanding
Code Code Available 1Movie101: A New Movie Understanding Benchmark May 20, 2023 Video Captioning
Code Code Available 1From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 1Hierarchical Video-Moment Retrieval and Step-Captioning Mar 29, 2023 Information Retrieval Moment Retrieval
Code Code Available 1Fine-grained Audible Video Description Mar 27, 2023 Language Modeling Language Modelling
Code Code Available 1GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation Mar 26, 2023 Video Captioning
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Mar 21, 2023 Contrastive Learning Image Captioning
Code Code Available 1Action knowledge for video captioning with graph neural networks Mar 16, 2023 Action Recognition Graph Neural Network
Code Code Available 1ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation Mar 11, 2023 Image Captioning Image to text
Code Code Available 1Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos Mar 11, 2023 Dense Video Captioning Natural Language Moment Retrieval
Code Code Available 1VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning Nov 28, 2022 Diversity Sentence
Code Code Available 1Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations Nov 21, 2022 Contrastive Learning Representation Learning
Code Code Available 1Visual Commonsense-aware Representation Network for Video Captioning Nov 17, 2022 Caption Generation Question Answering
Code Code Available 1Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality Nov 1, 2022 Data Augmentation Image Retrieval
Code Code Available 1Thinking Hallucination for Video Captioning Sep 28, 2022 Hallucination Video Captioning
Code Code Available 1An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 1Partially Relevant Video Retrieval Aug 26, 2022 Moment Retrieval Multiple Instance Learning
Code Code Available 1Zero-Shot Video Captioning with Evolving Pseudo-Tokens Jul 22, 2022 Image Captioning Image-text matching
Code Code Available 1Unifying Event Detection and Captioning as Sequence Generation via Pre-Training Jul 18, 2022 Dense Video Captioning Event Detection
Code Code Available 1Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches Jun 30, 2022 Caption Generation Video Captioning
Code Code Available 1VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning Jun 26, 2022 Contrastive Learning Diversity
Code Code Available 1LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Jun 14, 2022 Decoder Language Modeling
Code Code Available 1GL-RG: Global-Local Representation Granularity for Video Captioning May 22, 2022 Caption Generation Descriptive
Code Code Available 1Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners May 22, 2022 Attribute Automatic Speech Recognition
Code Code Available 1MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Apr 17, 2022 Navigate Retrieval
Code Code Available 1Tell me what you see: A zero-shot action recognition method based on natural language descriptions Dec 18, 2021 Action Recognition Descriptive
Code Code Available 1Controllable Video Captioning with an Exemplar Sentence Dec 2, 2021 Caption Generation Decoder
Code Code Available 1SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning Nov 25, 2021 Caption Generation Question Answering
Code Code Available 1Hierarchical Modular Network for Video Captioning Nov 24, 2021 Representation Learning Sentence
Code Code Available 1EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Nov 17, 2021 Language Modelling Video Captioning
Code Code Available 1Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks Nov 14, 2021 Action Classification Object
Code Code Available 1X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics Aug 18, 2021 Cross-Modal Retrieval Decoder
Code Code Available 1End-to-End Dense Video Captioning with Parallel Decoding Aug 17, 2021 Caption Generation Dense Video Captioning
Code Code Available 1Discriminative Latent Semantic Graph for Video Captioning Aug 8, 2021 Decoder Object
Code Code Available 1VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation Jun 8, 2021 Multi-Task Learning Question Answering
Code Code Available 1DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization Jun 1, 2021 Question Answering Retrieval
Code Code Available 1Improving Generation and Evaluation of Visual Stories via Semantic Consistency May 20, 2021 Image Generation Story Visualization
Code Code Available 1