FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 0It's Just Another Day: Unique Video Captioning by Discriminative Prompting Oct 15, 2024 Video Captioning
— Unverified 0LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Oct 14, 2024 Video Captioning Video Generation
Code Code Available 2MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models Oct 13, 2024 Cross-Modal Retrieval Question Answering
— Unverified 0Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization Oct 9, 2024 Audio captioning Large Language Model
— Unverified 0Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models Oct 4, 2024 Dense Video Captioning Sentence
Code Code Available 2IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 1SoccerNet 2024 Challenges Results Sep 16, 2024 Action Spotting Dense Video Captioning
Code Code Available 0Fine-grained length controllable video captioning with ordinal embeddings Aug 27, 2024 Video Captioning
— Unverified 0LongVILA: Scaling Long-Context Visual Language Models for Long Videos Aug 19, 2024 Video Captioning Video Question Answering
Code Code Available 0SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama Aug 18, 2024 Script Generation Video Captioning
Code Code Available 2CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Aug 12, 2024 Text-to-Video Generation Video Alignment
Code Code Available 11Dual-path Collaborative Generation Network for Emotional Video Captioning Aug 6, 2024 Caption Generation Video Captioning
Code Code Available 0COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos Jul 30, 2024 Semantic Role Labeling Video Captioning
Code Code Available 0Wolf: Captioning Everything with a World Summarization Framework Jul 26, 2024 Autonomous Driving Mixture-of-Experts
— Unverified 0Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance Jul 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0EVLM: An Efficient Vision-Language Model for Visual Understanding Jul 19, 2024 Image Captioning Language Modeling
— Unverified 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0Tarsier: Recipes for Training and Evaluating Large Video Description Models Jun 30, 2024 Video Captioning Video Description
Code Code Available 4Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks Jun 24, 2024 Question Answering Text Generation
— Unverified 0Live Video Captioning Jun 20, 2024 Dense Video Captioning Live Video Captioning
Code Code Available 0Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset Jun 19, 2024 Language Modeling Language Modelling
— Unverified 0GUI Action Narrator: Where and When Did That Action Take Place? Jun 19, 2024 Optical Character Recognition (OCR) Video Captioning
— Unverified 0AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding Jun 19, 2024 Question Answering Spatial Reasoning
Code Code Available 1A Survey of Video Datasets for Grounded Event Understanding Jun 14, 2024 Common Sense Reasoning Event Extraction
Code Code Available 0VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Jun 13, 2024 Dense Video Captioning MVBench
Code Code Available 3VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Jun 11, 2024 Multiple-choice Question Answering
Code Code Available 5NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative Jun 10, 2024 Language Modelling Large Language Model
— Unverified 0Vript: A Video Is Worth Thousands of Words Jun 10, 2024 Video Captioning Video Understanding
Code Code Available 2ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Jun 6, 2024 Video Captioning Video Generation
Code Code Available 5Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges Jun 4, 2024 Question Answering Story Generation
— Unverified 0Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization May 31, 2024 Sentence Video Captioning
Code Code Available 1VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding May 22, 2024 Dense Video Captioning Highlight Detection
Code Code Available 2RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning May 11, 2024 Image-text matching Retrieval
— Unverified 0A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection) May 2, 2024 Acoustic Scene Classification Event Detection
— Unverified 0Narrative Action Evaluation with Prompt-Guided Multimodal Interaction Apr 22, 2024 Action Quality Assessment multimodal interaction
Code Code Available 1Movie101v2: Improved Movie Narration Benchmark Apr 20, 2024 Video Captioning
Code Code Available 2The 8th AI City Challenge Apr 15, 2024 Dense Video Captioning Video Captioning
— Unverified 0TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning Apr 14, 2024 Dense Video Captioning Descriptive
Code Code Available 2Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis Apr 12, 2024 Dense Video Captioning Transfer Learning
Code Code Available 1Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval Apr 11, 2024 Decoder Dense Video Captioning
Code Code Available 2MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Apr 8, 2024 GPU Multiple-choice
Code Code Available 3DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement Apr 3, 2024 Dense Video Captioning Diversity
— Unverified 0Streaming Dense Video Captioning Apr 1, 2024 Dense Video Captioning Live Video Captioning
Code Code Available 0OmniVid: A Generative Framework for Universal Video Understanding Mar 26, 2024 Action Recognition Decoder
Code Code Available 2Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding Mar 24, 2024 Dense Video Captioning Temporal Localization
— Unverified 0GiT: Towards Generalist Vision Transformer through Universal Language Interface Mar 14, 2024 Language Modeling Language Modelling
Code Code Available 3Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation Mar 8, 2024 Articles Hallucination
— Unverified 0