Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Feb 29, 2024 Retrieval Text Retrieval
Code Code Available 45 Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Jan 14, 2025 Embodied Question Answering Hallucination
Code Code Available 45 Tarsier: Recipes for Training and Evaluating Large Video Description Models Jun 30, 2024 Video Captioning Video Description
Code Code Available 45 Hawk: Learning to Understand Open-World Video Anomalies May 27, 2024 Anomaly Detection Question Answering
Code Code Available 35 StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification Nov 11, 2024 Large Language Model Multimodal Large Language Model
Code Code Available 25 TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning Apr 14, 2024 Dense Video Captioning Descriptive
Code Code Available 25 FunQA: Towards Surprising Video Comprehension Jun 26, 2023 Question Answering Text Generation
Code Code Available 15 VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Apr 6, 2019 Machine Translation Translation
Code Code Available 15 Fine-grained Audible Video Description Mar 27, 2023 Language Modeling Language Modelling
Code Code Available 15 Identity-Aware Multi-Sentence Video Description Aug 22, 2020 Gender Prediction Sentence
Code Code Available 15 What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics May 12, 2022 Diversity Video Description
Code Code Available 15 Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research Mar 3, 2015 Descriptive Video Description
Code Code Available 15 Grounded Video Description Dec 17, 2018 Image Description Sentence
Code Code Available 15 Delving Deeper into the Decoder for Video Captioning Jan 16, 2020 Decoder Sentence
Code Code Available 15 Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7 Jun 1, 2018 Video Description Visual Dialog
Code Code Available 15 Thinking Hallucination for Video Captioning Sep 28, 2022 Hallucination Video Captioning
Code Code Available 15 Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents Aug 18, 2020 Video Description
Code Code Available 05 A Mid-level Video Representation based on Binary Descriptors: A Case Study for Pornography Detection May 12, 2016 Pornography Detection Video Description
Code Code Available 05 Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Dec 17, 2024 Dense Video Captioning Descriptive
Code Code Available 05 Video Description using Bidirectional Recurrent Neural Networks Apr 12, 2016 Decoder Text Generation
Code Code Available 05 https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 05 VizSeq: A Visual Analysis Toolkit for Text Generation Tasks Sep 12, 2019 Benchmarking Image Captioning
Code Code Available 05 TGIF: A New Dataset and Benchmark on Animated GIF Description Apr 10, 2016 Image Captioning Machine Translation
Code Code Available 05 Adversarial Inference for Multi-Sentence Video Description Dec 13, 2018 Diversity Image Captioning
Code Code Available 05 Predicting Visual Features from Text for Image and Video Caption Retrieval Sep 5, 2017 Retrieval Sentence
Code Code Available 05 MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 05 SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving Jul 18, 2024 Autonomous Driving Image Generation
Code Code Available 05 Egocentric Video Description based on Temporally-Linked Sequences Apr 7, 2017 Decoder Video Description
Code Code Available 05 JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models Mar 5, 2024 In-Context Learning Video Description
Code Code Available 05 Describing Videos by Exploiting Temporal Structure Feb 27, 2015 Action Recognition Image Description
Code Code Available 05 Edit As You Wish: Video Caption Editing with Multi-grained User Control May 15, 2023 Attribute Position
Code Code Available 05 Learn to Understand Negation in Video Retrieval Apr 30, 2022 Natural Language Queries Negation
Code Code Available 05 Memory-augmented Attention Modelling for Videos Nov 7, 2016 Video Description
Code Code Available 05 End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features Jun 21, 2018 Question Answering Video Description
Code Code Available 05 Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text Apr 6, 2016 Descriptive Language Modeling
Code Code Available 05 Attention-Based Multimodal Fusion for Video Description Jan 11, 2017 Decoder Sentence
— Unverified 00 DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description Mar 31, 2025 Video Description Video Understanding
— Unverified 00 Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023) Dec 12, 2023 Decoder Video Captioning
— Unverified 00 Cross-Modal Learning for Music-to-Music-Video Description Generation Mar 14, 2025 Video Description Video Generation
— Unverified 00 Coherent Multi-Sentence Video Description with Variable Level of Detail Mar 24, 2014 Sentence Video Description
— Unverified 00 Attend and Interact: Higher-Order Object Interactions for Video Understanding Nov 16, 2017 Action Classification Action Recognition
— Unverified 00 CLearViD: Curriculum Learning for Video Description Nov 8, 2023 Diversity Video Description
— Unverified 00 Prediction and Description of Near-Future Activities in Video Aug 2, 2019 Prediction Video Captioning
— Unverified 00 A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching Jun 1, 2013 Image Description Video Description
— Unverified 00 A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles Jun 11, 2024 Sentiment Analysis Subjectivity Analysis
— Unverified 00 Active Learning for Video Description With Cluster-Regularized Ensemble Ranking Jul 27, 2020 Active Learning Video Captioning
— Unverified 00 FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 00 Incorporating Background Knowledge into Video Description Generation Oct 1, 2018 Decoder Text Generation
— Unverified 00 Incorporating Global Visual Features into Attention-based Neural Machine Translation. Sep 1, 2017 Decoder Machine Translation
— Unverified 00 HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Mar 31, 2025 Hallucination Human-Object Interaction Detection
— Unverified 00