Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Feb 29, 2024 Retrieval Text Retrieval
Code Code Available 4Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Jan 14, 2025 Embodied Question Answering Hallucination
Code Code Available 4Tarsier: Recipes for Training and Evaluating Large Video Description Models Jun 30, 2024 Video Captioning Video Description
Code Code Available 4Hawk: Learning to Understand Open-World Video Anomalies May 27, 2024 Anomaly Detection Question Answering
Code Code Available 3TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning Apr 14, 2024 Dense Video Captioning Descriptive
Code Code Available 2StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification Nov 11, 2024 Large Language Model Multimodal Large Language Model
Code Code Available 2Identity-Aware Multi-Sentence Video Description Aug 22, 2020 Gender Prediction Sentence
Code Code Available 1Delving Deeper into the Decoder for Video Captioning Jan 16, 2020 Decoder Sentence
Code Code Available 1Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research Mar 3, 2015 Descriptive Video Description
Code Code Available 1What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics May 12, 2022 Diversity Video Description
Code Code Available 1Fine-grained Audible Video Description Mar 27, 2023 Language Modeling Language Modelling
Code Code Available 1VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Apr 6, 2019 Machine Translation Translation
Code Code Available 1FunQA: Towards Surprising Video Comprehension Jun 26, 2023 Question Answering Text Generation
Code Code Available 1Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7 Jun 1, 2018 Video Description Visual Dialog
Code Code Available 1Thinking Hallucination for Video Captioning Sep 28, 2022 Hallucination Video Captioning
Code Code Available 1Grounded Video Description Dec 17, 2018 Image Description Sentence
Code Code Available 1MSR-VTT: A Large Video Description Dataset for Bridging Video and Language Jun 1, 2016 Image Captioning Sentence
— Unverified 0Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023) Dec 12, 2023 Decoder Video Captioning
— Unverified 0MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish Dec 13, 2020 Machine Translation Multimodal Machine Translation
— Unverified 0Attend and Interact: Higher-Order Object Interactions for Video Understanding Nov 16, 2017 Action Classification Action Recognition
— Unverified 0A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching Jun 1, 2013 Image Description Video Description
— Unverified 0A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles Jun 11, 2024 Sentiment Analysis Subjectivity Analysis
— Unverified 0LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living Jun 13, 2024 Benchmarking Human-Object Interaction Detection
— Unverified 0CLearViD: Curriculum Learning for Video Description Nov 8, 2023 Diversity Video Description
— Unverified 0Coherent Multi-Sentence Video Description with Variable Level of Detail Mar 24, 2014 Sentence Video Description
— Unverified 0Cross-Modal Learning for Music-to-Music-Video Description Generation Mar 14, 2025 Video Description Video Generation
— Unverified 0DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description Mar 31, 2025 Video Description Video Understanding
— Unverified 0Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering Jan 3, 2020 Question Answering Video Description
— Unverified 0Attention-Based Multimodal Fusion for Video Description Jan 11, 2017 Decoder Sentence
— Unverified 0Active Learning for Video Description With Cluster-Regularized Ensemble Ranking Jul 27, 2020 Active Learning Video Captioning
— Unverified 0ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition Jan 22, 2024 Action Recognition Video Description
— Unverified 0End-to-End Video Captioning Apr 4, 2019 Action Recognition Caption Generation
— Unverified 0Interpretable Video Captioning via Trajectory Structured Localization Jun 1, 2018 Decoder Image Captioning
— Unverified 0Boosting Video Captioning with Dynamic Loss Network Jul 25, 2021 image-classification Image Classification
— Unverified 0Bidirectional Long-Short Term Memory for Video Description Jun 15, 2016 Language Modeling Language Modelling
— Unverified 0An Efficient Keyframes Selection Based Framework for Video Captioning Dec 1, 2021 Text Generation Video Captioning
— Unverified 0JU\_CSE\_NLP: Multi-grade Classification of Semantic Similarity between Text Pairs Jul 1, 2012 General Classification Semantic Similarity
— Unverified 0Generating Video Description using Sequence-to-sequence Model with Temporal Attention Dec 1, 2016 Caption Generation Sentence
— Unverified 0Better Exploiting Motion for Better Action Recognition Jun 1, 2013 Action Recognition Image Retrieval
— Unverified 0Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis Feb 11, 2025 Action Recognition Video Description
— Unverified 0HENRY-CORE: Domain Adaptation and Stacking for Text Similarity Jun 1, 2013 Domain Adaptation Machine Translation
— Unverified 0Hierarchical Boundary-Aware Neural Encoder for Video Captioning Nov 28, 2016 Decoder Video Captioning
— Unverified 0HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Mar 31, 2025 Hallucination Human-Object Interaction Detection
— Unverified 0Bridge Video and Text with Cascade Syntactic Structure Aug 1, 2018 Attribute Object
— Unverified 0AVD2: Accident Video Diffusion for Accident Video Description Feb 20, 2025 Autonomous Driving Scene Understanding
— Unverified 0FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 0Prediction and Description of Near-Future Activities in Video Aug 2, 2019 Prediction Video Captioning
— Unverified 0Incorporating Background Knowledge into Video Description Generation Oct 1, 2018 Decoder Text Generation
— Unverified 0Incorporating Global Visual Features into Attention-based Neural Machine Translation. Sep 1, 2017 Decoder Machine Translation
— Unverified 0Analyzing Political Figures in Real-Time: Leveraging YouTube Metadata for Sentiment Analysis Sep 28, 2023 Sentiment Analysis Video Description
— Unverified 0