UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks Jul 15, 2025 Video Captioning Video Understanding
Code Code Available 1Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization Jun 25, 2025 Dense Video Captioning Descriptive
— Unverified 0Dense Video Captioning using Graph-based Sentence Summarization Jun 25, 2025 Dense Video Captioning Sentence
— Unverified 0video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Jun 18, 2025 Audio captioning Large Language Model
Code Code Available 2VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Jun 10, 2025 Multiple-choice Open-Ended Question Answering
— Unverified 0ARGUS: Hallucination and Omission Evaluation in Video-LLMs Jun 9, 2025 Descriptive Form
— Unverified 0Temporal Object Captioning for Street Scene Videos from LiDAR Tracks May 22, 2025 Caption Generation Video Captioning
— Unverified 0FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks May 19, 2025 Video Captioning
Code Code Available 0TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation Apr 24, 2025 Caption Generation Dense Video Captioning
— Unverified 0Describe Anything: Detailed Localized Image and Video Captioning Apr 22, 2025 Sentence Video Captioning
— Unverified 0FocusedAD: Character-centric Movie Audio Description Apr 16, 2025 Video Captioning
Code Code Available 0Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Apr 7, 2025 Boundary Detection Object
Code Code Available 2The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning Mar 31, 2025 Video Captioning
— Unverified 0Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding Mar 14, 2025 Denoising Dense Video Captioning
— Unverified 0Get In Video: Add Anything You Want to the Video Mar 8, 2025 object-detection Object Detection
— Unverified 0Fine-Grained Video Captioning through Scene Graph Consolidation Feb 23, 2025 Caption Generation Image Captioning
— Unverified 0LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models Feb 21, 2025 Caption Generation Video Captioning
— Unverified 0Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning Feb 19, 2025 Knowledge Distillation Object
— Unverified 0Pretrained Image-Text Models are Secretly Video Captioners Feb 19, 2025 Image Captioning Video Captioning
Code Code Available 0VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation Feb 18, 2025 Text-to-Video Generation Video Captioning
Code Code Available 1Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding Feb 9, 2025 Image Captioning Image-text Retrieval
Code Code Available 3MAMS: Model-Agnostic Module Selection Framework for Video Captioning Jan 30, 2025 Caption Generation Video Captioning
— Unverified 0VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning Jan 12, 2025 Dense Video Captioning Video Captioning
Code Code Available 1Classifier-Guided Captioning Across Modalities Jan 3, 2025 Audio captioning Video Captioning
— Unverified 0AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Jan 1, 2025 GPU Question Answering
— Unverified 0Event-Equalized Dense Video Captioning Jan 1, 2025 Dense Video Captioning Video Captioning
— Unverified 0CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval Dec 31, 2024 Retrieval Text Retrieval
— Unverified 0Hierarchical Banzhaf Interaction for General Video-Language Representation Learning Dec 30, 2024 Contrastive Learning Question Answering
Code Code Available 0PolySmart @ TRECVid 2024 Video Captioning (VTT) Dec 20, 2024 Video Captioning
— Unverified 0HiCM^2: Hierarchical Compact Memory Modeling for Dense Video Captioning Dec 19, 2024 Dense Video Captioning Video Captioning
Code Code Available 1G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Dec 18, 2024 Image Captioning Video Captioning
Code Code Available 1Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning Dec 17, 2024 Dense Video Captioning Descriptive
Code Code Available 0Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning Dec 16, 2024 Contrastive Learning Dense Video Captioning
— Unverified 0VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting Dec 16, 2024 Informativeness Large Language Model
Code Code Available 0Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives Dec 14, 2024 Descriptive Language Modeling
— Unverified 0ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation Dec 12, 2024 Phrase Grounding Question Answering
— Unverified 0Agent-based Video Trimming Dec 12, 2024 Highlight Detection Moment Retrieval
— Unverified 0Video LLMs for Temporal Reasoning in Long Videos Dec 4, 2024 Action Segmentation Dense Video Captioning
— Unverified 0Progress-Aware Video Frame Captioning Dec 3, 2024 Image Captioning Video Captioning
— Unverified 0HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation Nov 27, 2024 Graph Generation Question Answering
— Unverified 0VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format Nov 27, 2024 Dense Video Captioning Grounded Video Question Answering
Code Code Available 1Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding Nov 25, 2024 Dense Video Captioning Transfer Learning
— Unverified 0FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Nov 23, 2024 Attribute Cross-Modal Retrieval
— Unverified 0Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning Nov 22, 2024 Dense Video Captioning Video Captioning
— Unverified 0AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Nov 19, 2024 GPU Question Answering
— Unverified 0Multi-Modal interpretable automatic video captioning Nov 11, 2024 Decision Making Video Captioning
— Unverified 0Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning Nov 6, 2024 Video Captioning
Code Code Available 0SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities Nov 4, 2024 Attribute Descriptive
— Unverified 0Technical Report for Soccernet 2023 -- Dense Video Captioning Oct 31, 2024 Dense Video Captioning Video Captioning
— Unverified 0EVC-MF: End-to-end Video Captioning Network with Multi-scale Features Oct 22, 2024 Decoder Video Captioning
— Unverified 0