AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Jan 1, 2025 GPU Question Answering
— Unverified 0Event-Equalized Dense Video Captioning Jan 1, 2025 Dense Video Captioning Video Captioning
— Unverified 0CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval Dec 31, 2024 Retrieval Text Retrieval
— Unverified 0Hierarchical Banzhaf Interaction for General Video-Language Representation Learning Dec 30, 2024 Contrastive Learning Question Answering
— Unverified 0PolySmart @ TRECVid 2024 Video Captioning (VTT) Dec 20, 2024 Video Captioning
— Unverified 0Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Dec 17, 2024 Dense Video Captioning Descriptive
Code Code Available 0VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting Dec 16, 2024 Informativeness Large Language Model
Code Code Available 0Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning Dec 16, 2024 Contrastive Learning Dense Video Captioning
— Unverified 0Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives Dec 14, 2024 Descriptive Language Modeling
— Unverified 0ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation Dec 12, 2024 Phrase Grounding Question Answering
— Unverified 0Agent-based Video Trimming Dec 12, 2024 Highlight Detection Moment Retrieval
— Unverified 0Video LLMs for Temporal Reasoning in Long Videos Dec 4, 2024 Action Segmentation Dense Video Captioning
— Unverified 0Progress-Aware Video Frame Captioning Dec 3, 2024 Image Captioning Video Captioning
— Unverified 0HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation Nov 27, 2024 Graph Generation Question Answering
— Unverified 0Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding Nov 25, 2024 Dense Video Captioning Transfer Learning
— Unverified 0FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Nov 23, 2024 Attribute Cross-Modal Retrieval
— Unverified 0Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning Nov 22, 2024 Dense Video Captioning Video Captioning
— Unverified 0AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Nov 19, 2024 GPU Question Answering
— Unverified 0Multi-Modal interpretable automatic video captioning Nov 11, 2024 Decision Making Video Captioning
— Unverified 0Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning Nov 6, 2024 Video Captioning
Code Code Available 0SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities Nov 4, 2024 Attribute Descriptive
— Unverified 0Technical Report for Soccernet 2023 -- Dense Video Captioning Oct 31, 2024 Dense Video Captioning Video Captioning
— Unverified 0EVC-MF: End-to-end Video Captioning Network with Multi-scale Features Oct 22, 2024 Decoder Video Captioning
— Unverified 0FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning Oct 20, 2024 Diagnostic Video Captioning
— Unverified 0It's Just Another Day: Unique Video Captioning by Discriminative Prompting Oct 15, 2024 Video Captioning
— Unverified 0MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models Oct 13, 2024 Cross-Modal Retrieval Question Answering
— Unverified 0Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization Oct 9, 2024 Audio captioning Large Language Model
— Unverified 0SoccerNet 2024 Challenges Results Sep 16, 2024 Action Spotting Dense Video Captioning
Code Code Available 0Fine-grained length controllable video captioning with ordinal embeddings Aug 27, 2024 Video Captioning
— Unverified 0LongVILA: Scaling Long-Context Visual Language Models for Long Videos Aug 19, 2024 Video Captioning Video Question Answering
— Unverified 0Dual-path Collaborative Generation Network for Emotional Video Captioning Aug 6, 2024 Caption Generation Video Captioning
Code Code Available 0Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos Jul 30, 2024 Semantic Role Labeling Video Captioning
Code Code Available 0Wolf: Captioning Everything with a World Summarization Framework Jul 26, 2024 Autonomous Driving Mixture-of-Experts
— Unverified 0Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance Jul 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0EVLM: An Efficient Vision-Language Model for Visual Understanding Jul 19, 2024 Image Captioning Language Modeling
— Unverified 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks Jun 24, 2024 Question Answering Text Generation
— Unverified 0Live Video Captioning Jun 20, 2024 Dense Video Captioning Live Video Captioning
Code Code Available 0GUI Action Narrator: Where and When Did That Action Take Place? Jun 19, 2024 Optical Character Recognition (OCR) Video Captioning
— Unverified 0Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset Jun 19, 2024 Language Modeling Language Modelling
— Unverified 0A Survey of Video Datasets for Grounded Event Understanding Jun 14, 2024 Common Sense Reasoning Event Extraction
Code Code Available 0NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative Jun 10, 2024 Language Modelling Large Language Model
— Unverified 0Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges Jun 4, 2024 Question Answering Story Generation
— Unverified 0RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning May 11, 2024 Image-text matching Retrieval
— Unverified 0A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection) May 2, 2024 Acoustic Scene Classification Event Detection
— Unverified 0The 8th AI City Challenge Apr 15, 2024 Dense Video Captioning Video Captioning
— Unverified 0DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement Apr 3, 2024 Dense Video Captioning Diversity
— Unverified 0Streaming Dense Video Captioning Apr 1, 2024 Dense Video Captioning Live Video Captioning
— Unverified 0Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding Mar 24, 2024 Dense Video Captioning Temporal Localization
— Unverified 0Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation Mar 8, 2024 Articles Hallucination
— Unverified 0