Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Jan 14, 2025 Embodied Question Answering Hallucination
Code Code Available 4Tarsier: Recipes for Training and Evaluating Large Video Description Models Jun 30, 2024 Video Captioning Video Description
Code Code Available 4Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Feb 29, 2024 Retrieval Text Retrieval
Code Code Available 4Hawk: Learning to Understand Open-World Video Anomalies May 27, 2024 Anomaly Detection Question Answering
Code Code Available 3StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification Nov 11, 2024 Large Language Model Multimodal Large Language Model
Code Code Available 2TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning Apr 14, 2024 Dense Video Captioning Descriptive
Code Code Available 2FunQA: Towards Surprising Video Comprehension Jun 26, 2023 Question Answering Text Generation
Code Code Available 1Fine-grained Audible Video Description Mar 27, 2023 Language Modeling Language Modelling
Code Code Available 1Thinking Hallucination for Video Captioning Sep 28, 2022 Hallucination Video Captioning
Code Code Available 1What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics May 12, 2022 Diversity Video Description
Code Code Available 1Identity-Aware Multi-Sentence Video Description Aug 22, 2020 Gender Prediction Sentence
Code Code Available 1Delving Deeper into the Decoder for Video Captioning Jan 16, 2020 Decoder Sentence
Code Code Available 1VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Apr 6, 2019 Machine Translation Translation
Code Code Available 1Grounded Video Description Dec 17, 2018 Image Description Sentence
Code Code Available 1Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7 Jun 1, 2018 Video Description Visual Dialog
Code Code Available 1Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research Mar 3, 2015 Descriptive Video Description
Code Code Available 1DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description Mar 31, 2025 Video Description Video Understanding
— Unverified 0HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Mar 31, 2025 Hallucination Human-Object Interaction Detection
— Unverified 0Cross-Modal Learning for Music-to-Music-Video Description Generation Mar 14, 2025 Video Description Video Generation
— Unverified 0VideoA11y: Method and Dataset for Accessible Video Description Feb 27, 2025 Video Description
— Unverified 0AVD2: Accident Video Diffusion for Accident Video Description Feb 20, 2025 Autonomous Driving Scene Understanding
— Unverified 0Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis Feb 11, 2025 Action Recognition Video Description
— Unverified 0Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time Jan 14, 2025 Object Recognition Text Generation
— Unverified 0Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Dec 17, 2024 Dense Video Captioning Descriptive
Code Code Available 0PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation Oct 30, 2024 Anomaly Detection Descriptive
— Unverified 0FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 0VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Oct 1, 2024 Hallucination text similarity
— Unverified 0Technical Report: Competition Solution For Modelscope-Sora Sep 24, 2024 Text-to-Video Generation Video Description
— Unverified 0Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation Aug 19, 2024 Instruction Following Large Language Model
— Unverified 0SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving Jul 18, 2024 Autonomous Driving Image Generation
Code Code Available 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living Jun 13, 2024 Benchmarking Human-Object Interaction Detection
— Unverified 0A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles Jun 11, 2024 Sentiment Analysis Subjectivity Analysis
— Unverified 0X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model Apr 7, 2024 Action Recognition Decision Making
— Unverified 0JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models Mar 5, 2024 In-Context Learning Video Description
Code Code Available 0Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews) Jan 23, 2024 Miscellaneous Video Description
— Unverified 0ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition Jan 22, 2024 Action Recognition Video Description
— Unverified 0Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023) Dec 12, 2023 Decoder Video Captioning
— Unverified 0Multi Sentence Description of Complex Manipulation Action Videos Nov 13, 2023 Decoder Sentence
— Unverified 0CLearViD: Curriculum Learning for Video Description Nov 8, 2023 Diversity Video Description
— Unverified 0Analyzing Political Figures in Real-Time: Leveraging YouTube Metadata for Sentiment Analysis Sep 28, 2023 Sentiment Analysis Video Description
— Unverified 0MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 0Edit As You Wish: Video Caption Editing with Multi-grained User Control May 15, 2023 Attribute Position
Code Code Available 0Learn to Understand Negation in Video Retrieval Apr 30, 2022 Natural Language Queries Negation
Code Code Available 0Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation Dec 28, 2021 Image Captioning Machine Translation
— Unverified 0Relational Graph Learning for Grounded Video Description Generation Dec 2, 2021 Graph Learning Hallucination
— Unverified 0An Efficient Keyframes Selection Based Framework for Video Captioning Dec 1, 2021 Text Generation Video Captioning
— Unverified 0NarrationBot and InfoBot: A Hybrid System for Automated Video Description Nov 7, 2021 Video Description
— Unverified 0Visual-aware Attention Dual-stream Decoder for Video Captioning Oct 16, 2021 Decoder Video Captioning
— Unverified 0Boosting Video Captioning with Dynamic Loss Network Jul 25, 2021 image-classification Image Classification
— Unverified 0