DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description Mar 31, 2025 Video Description Video Understanding
— Unverified 0HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Mar 31, 2025 Hallucination Human-Object Interaction Detection
— Unverified 0Cross-Modal Learning for Music-to-Music-Video Description Generation Mar 14, 2025 Video Description Video Generation
— Unverified 0VideoA11y: Method and Dataset for Accessible Video Description Feb 27, 2025 Video Description
— Unverified 0AVD2: Accident Video Diffusion for Accident Video Description Feb 20, 2025 Autonomous Driving Scene Understanding
— Unverified 0Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis Feb 11, 2025 Action Recognition Video Description
— Unverified 0Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time Jan 14, 2025 Object Recognition Text Generation
— Unverified 0Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Jan 14, 2025 Embodied Question Answering Hallucination
Code Code Available 4Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Dec 17, 2024 Dense Video Captioning Descriptive
Code Code Available 0StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification Nov 11, 2024 Large Language Model Multimodal Large Language Model
Code Code Available 2PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation Oct 30, 2024 Anomaly Detection Descriptive
— Unverified 0FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 0VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Oct 1, 2024 Hallucination text similarity
— Unverified 0Technical Report: Competition Solution For Modelscope-Sora Sep 24, 2024 Text-to-Video Generation Video Description
— Unverified 0Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation Aug 19, 2024 Instruction Following Large Language Model
— Unverified 0SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving Jul 18, 2024 Autonomous Driving Image Generation
Code Code Available 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0Tarsier: Recipes for Training and Evaluating Large Video Description Models Jun 30, 2024 Video Captioning Video Description
Code Code Available 4LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living Jun 13, 2024 Benchmarking Human-Object Interaction Detection
— Unverified 0A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles Jun 11, 2024 Sentiment Analysis Subjectivity Analysis
— Unverified 0Hawk: Learning to Understand Open-World Video Anomalies May 27, 2024 Anomaly Detection Question Answering
Code Code Available 3TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning Apr 14, 2024 Dense Video Captioning Descriptive
Code Code Available 2X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model Apr 7, 2024 Action Recognition Decision Making
— Unverified 0JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models Mar 5, 2024 In-Context Learning Video Description
Code Code Available 0Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Feb 29, 2024 Retrieval Text Retrieval
Code Code Available 4Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews) Jan 23, 2024 Miscellaneous Video Description
— Unverified 0ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition Jan 22, 2024 Action Recognition Video Description
— Unverified 0Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023) Dec 12, 2023 Decoder Video Captioning
— Unverified 0Multi Sentence Description of Complex Manipulation Action Videos Nov 13, 2023 Decoder Sentence
— Unverified 0CLearViD: Curriculum Learning for Video Description Nov 8, 2023 Diversity Video Description
— Unverified 0Analyzing Political Figures in Real-Time: Leveraging YouTube Metadata for Sentiment Analysis Sep 28, 2023 Sentiment Analysis Video Description
— Unverified 0FunQA: Towards Surprising Video Comprehension Jun 26, 2023 Question Answering Text Generation
Code Code Available 1MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 0Edit As You Wish: Video Caption Editing with Multi-grained User Control May 15, 2023 Attribute Position
Code Code Available 0Fine-grained Audible Video Description Mar 27, 2023 Language Modeling Language Modelling
Code Code Available 1Thinking Hallucination for Video Captioning Sep 28, 2022 Hallucination Video Captioning
Code Code Available 1What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics May 12, 2022 Diversity Video Description
Code Code Available 1Learn to Understand Negation in Video Retrieval Apr 30, 2022 Natural Language Queries Negation
Code Code Available 0Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation Dec 28, 2021 Image Captioning Machine Translation
— Unverified 0Relational Graph Learning for Grounded Video Description Generation Dec 2, 2021 Graph Learning Hallucination
— Unverified 0An Efficient Keyframes Selection Based Framework for Video Captioning Dec 1, 2021 Text Generation Video Captioning
— Unverified 0NarrationBot and InfoBot: A Hybrid System for Automated Video Description Nov 7, 2021 Video Description
— Unverified 0Visual-aware Attention Dual-stream Decoder for Video Captioning Oct 16, 2021 Decoder Video Captioning
— Unverified 0Boosting Video Captioning with Dynamic Loss Network Jul 25, 2021 image-classification Image Classification
— Unverified 0Efficient data-driven encoding of scene motion using Eccentricity Mar 3, 2021 Activity Recognition Intent Recognition
— Unverified 0The Role of the Input in Natural Language Video Description Feb 9, 2021 Data Augmentation Video Description
— Unverified 0Unbox the Blackbox: Predict and Interpret YouTube Viewership Using Deep Learning Dec 21, 2020 Misinformation Prediction
— Unverified 0MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish Dec 13, 2020 Machine Translation Multimodal Machine Translation
— Unverified 0A Comprehensive Review on Recent Methods and Challenges of Video Description Nov 30, 2020 Machine Translation Survey
— Unverified 0Identity-Aware Multi-Sentence Video Description Aug 22, 2020 Gender Prediction Sentence
Code Code Available 1