FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 0It's Just Another Day: Unique Video Captioning by Discriminative Prompting Oct 15, 2024 Video Captioning
— Unverified 0LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Oct 14, 2024 Video Captioning Video Generation
Code Code Available 2MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models Oct 13, 2024 Cross-Modal Retrieval Question Answering
— Unverified 0Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization Oct 9, 2024 Audio captioning Large Language Model
— Unverified 0Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models Oct 4, 2024 Dense Video Captioning Sentence
Code Code Available 2IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 1SoccerNet 2024 Challenges Results Sep 16, 2024 Action Spotting Dense Video Captioning
Code Code Available 0Fine-grained length controllable video captioning with ordinal embeddings Aug 27, 2024 Video Captioning
— Unverified 0LongVILA: Scaling Long-Context Visual Language Models for Long Videos Aug 19, 2024 Video Captioning Video Question Answering
Code Code Available 0SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama Aug 18, 2024 Script Generation Video Captioning
Code Code Available 2CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Aug 12, 2024 Text-to-Video Generation Video Alignment
Code Code Available 11Dual-path Collaborative Generation Network for Emotional Video Captioning Aug 6, 2024 Caption Generation Video Captioning
Code Code Available 0COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos Jul 30, 2024 Semantic Role Labeling Video Captioning
Code Code Available 0Wolf: Captioning Everything with a World Summarization Framework Jul 26, 2024 Autonomous Driving Mixture-of-Experts
— Unverified 0EVLM: An Efficient Vision-Language Model for Visual Understanding Jul 19, 2024 Image Captioning Language Modeling
— Unverified 0Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance Jul 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0Tarsier: Recipes for Training and Evaluating Large Video Description Models Jun 30, 2024 Video Captioning Video Description
Code Code Available 4Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks Jun 24, 2024 Question Answering Text Generation
— Unverified 0Live Video Captioning Jun 20, 2024 Dense Video Captioning Live Video Captioning
Code Code Available 0GUI Action Narrator: Where and When Did That Action Take Place? Jun 19, 2024 Optical Character Recognition (OCR) Video Captioning
— Unverified 0Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset Jun 19, 2024 Language Modeling Language Modelling
— Unverified 0