VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Jul 17, 2025 Video Grounding Video Understanding
— Unverified 0Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency Jun 2, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 2SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models May 24, 2025 Benchmarking Video Grounding
— Unverified 0DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos May 22, 2025 Natural Language Moment Retrieval Natural Language Queries
Code Code Available 1Object-Shot Enhanced Grounding Network for Egocentric Video May 7, 2025 Video Grounding
Code Code Available 1Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection Mar 29, 2025 Prediction Video Grounding
— Unverified 0VideoGEM: Training-free Action Grounding in Videos Mar 26, 2025 Video Grounding
— Unverified 0SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability Mar 18, 2025 Language Modeling Language Modelling
— Unverified 0TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Mar 17, 2025 Video Grounding
Code Code Available 2OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding Mar 13, 2025 Object Video Grounding
Code Code Available 1TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos Mar 9, 2025 Action Localization Boundary Detection
Code Code Available 1Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding Feb 16, 2025 Attribute Object
Code Code Available 1Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Jan 14, 2025 Feature Compression Language Modeling
Code Code Available 2Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Jan 14, 2025 Embodied Question Answering Hallucination
Code Code Available 4VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning Jan 12, 2025 Dense Video Captioning Video Captioning
Code Code Available 1STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding Jan 1, 2025 Action Understanding Spatio-Temporal Video Grounding
— Unverified 0Consistency of Compositional Generalization across Multiple Levels Dec 18, 2024 Meta-Learning Question Answering
Code Code Available 0Multi-Scale Contrastive Learning for Video Temporal Grounding Dec 10, 2024 Contrastive Learning Data Augmentation
— Unverified 0Video LLMs for Temporal Reasoning in Long Videos Dec 4, 2024 Action Segmentation Dense Video Captioning
— Unverified 0VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format Nov 27, 2024 Dense Video Captioning Grounded Video Question Answering
Code Code Available 1Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding Nov 25, 2024 Dense Video Captioning Transfer Learning
— Unverified 0SimBase: A Simple Baseline for Temporal Video Grounding Nov 12, 2024 Video Grounding
— Unverified 0SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses Aug 3, 2024 Natural Language Queries Video Grounding
— Unverified 0Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval Jul 21, 2024 General Knowledge Highlight Detection
Code Code Available 2Multi-sentence Video Grounding for Long Video Generation Jul 18, 2024 Moment Retrieval Retrieval
— Unverified 0Described Spatial-Temporal Video Detection Jul 8, 2024 Multi-class Classification Temporal Localization
— Unverified 0AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding Jun 11, 2024 regression Video Grounding
— Unverified 0Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network Jun 7, 2024 Depth Estimation Depth Prediction
— Unverified 0Artemis: Towards Referential Understanding in Complex Videos Jun 1, 2024 Text Summarization Video Grounding
Code Code Available 0Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition May 7, 2024 Large Language Model Multimodal Large Language Model
— Unverified 0SnAG: Scalable and Accurate Video Grounding Apr 2, 2024 Video Grounding Video Understanding
Code Code Available 4SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding Apr 1, 2024 Mamba State Space Models
— Unverified 0InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Mar 22, 2024 Action Classification Action Recognition
Code Code Available 7Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding Mar 21, 2024 Video Grounding
Code Code Available 0HawkEye: Training Video-Text LLMs for Grounding Text in Videos Mar 15, 2024 Video Grounding Video Question Answering
Code Code Available 1Context-Guided Spatio-Temporal Video Grounding Jan 3, 2024 Object Spatio-Temporal Video Grounding
Code Code Available 2VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding Jan 1, 2024 Spatio-Temporal Video Grounding Video Grounding
— Unverified 0Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding Dec 31, 2023 Spatio-Temporal Video Grounding Video Grounding
— Unverified 0Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding Dec 27, 2023 Sentence Temporal Sentence Grounding
Code Code Available 1Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding Dec 21, 2023 Domain Adaptation Unsupervised Domain Adaptation
— Unverified 0LLM4VG: Large Language Models Evaluation for Video Grounding Dec 21, 2023 Image Captioning Video Grounding
— Unverified 0Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval Dec 12, 2023 Contrastive Learning Moment Retrieval
Code Code Available 0Grounded Question-Answering in Long Egocentric Videos Dec 11, 2023 Video Grounding Video Question Answering
Code Code Available 1EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model Dec 5, 2023 Boundary Detection Language Modeling
— Unverified 0VTimeLLM: Empower LLM to Grasp Video Moments Nov 30, 2023 Dense Video Captioning Temporal Relation Extraction
Code Code Available 2Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection Nov 28, 2023 Contrastive Learning Highlight Detection
Code Code Available 1PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Nov 22, 2023 Benchmarking Phrase Grounding
Code Code Available 2Exploring Iterative Refinement with Diffusion Models for Video Grounding Oct 26, 2023 Sentence Video Grounding
Code Code Available 0Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding Sep 12, 2023 Sentence text similarity
Code Code Available 0