AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding Jun 11, 2024 regression Video Grounding
— Unverified 0Cascaded Prediction Network via Segment Tree for Temporal Video Grounding Jun 19, 2021 Sentence Video Grounding
— Unverified 0Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos Mar 23, 2021 Referring Expression Referring Expression Comprehension
— Unverified 0Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding Jan 1, 2023 Object Spatio-Temporal Video Grounding
— Unverified 0Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0Described Spatial-Temporal Video Detection Jul 8, 2024 Multi-class Classification Temporal Localization
— Unverified 0DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection Aug 29, 2023 Denoising Highlight Detection
— Unverified 0End-to-End Dense Video Grounding via Parallel Regression Sep 23, 2021 regression Sentence
— Unverified 0End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding Mar 15, 2022 Descriptive Representation Learning
— Unverified 0Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection Mar 29, 2025 Prediction Video Grounding
— Unverified 0EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model Dec 5, 2023 Boundary Detection Language Modeling
— Unverified 0EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation Sep 10, 2021 Translation Video Grounding
— Unverified 0Exploiting Feature Diversity for Make-up Temporal Video Grounding Aug 12, 2022 Diversity Video Grounding
— Unverified 0G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory Jul 26, 2023 Contrastive Learning Video Grounding
— Unverified 0Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding Jul 2, 2022 Spatio-Temporal Video Grounding Video Grounding
— Unverified 0Exploiting Auxiliary Caption for Video Grounding Jan 15, 2023 Contrastive Learning Dense Video Captioning
— Unverified 0Generation-Guided Multi-Level Unified Network for Video Grounding Mar 14, 2023 Video Grounding
— Unverified 0Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization Oct 10, 2022 Video Grounding
— Unverified 0Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding Jan 1, 2023 Decoder Sentence
— Unverified 0Iterative Proposal Refinement for Weakly-Supervised Video Grounding Jan 1, 2023 Sentence Video Grounding
— Unverified 0Language-free Training for Zero-shot Video Grounding Oct 24, 2022 Video Grounding
— Unverified 0LLM4VG: Large Language Models Evaluation for Video Grounding Dec 21, 2023 Image Captioning Video Grounding
— Unverified 0LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach Dec 19, 2021 Inductive Bias Video Grounding
— Unverified 0Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation Jan 1, 2022 Object Referring Expression Segmentation
— Unverified 0Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding Dec 21, 2023 Domain Adaptation Unsupervised Domain Adaptation
— Unverified 0Multi-Scale Contrastive Learning for Video Temporal Grounding Dec 10, 2024 Contrastive Learning Data Augmentation
— Unverified 0Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding Mar 8, 2022 Contrastive Learning Sentence
— Unverified 0Multi-sentence Video Grounding for Long Video Generation Jul 18, 2024 Moment Retrieval Retrieval
— Unverified 0No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection Jul 20, 2023 Boundary Detection Video Grounding
— Unverified 0Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses Jun 1, 2019 All Clustering
— Unverified 0Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding Aug 16, 2020 Diversity Object
— Unverified 0On Pursuit of Designing Multi-modal Transformer for Video Grounding Sep 13, 2021 All Decoder
— Unverified 0On the Effects of Video Grounding on Language Models Oct 1, 2022 Image Captioning Question Answering
— Unverified 0Parallel Attention Network with Sequence Matching for Video Grounding May 18, 2021 Representation Learning Video Grounding
— Unverified 0Position-aware Location Regression Network for Temporal Video Grounding Apr 12, 2022 Position regression
— Unverified 0SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models May 24, 2025 Benchmarking Video Grounding
— Unverified 0Semi-Supervised Video Paragraph Grounding With Contrastive Encoder Jan 1, 2022 Sentence Video Grounding
— Unverified 0Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding Nov 25, 2024 Dense Video Captioning Transfer Learning
— Unverified 0Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022 Technical Report Jul 6, 2022 Sentence Temporal Localization
— Unverified 0Unsupervised Temporal Video Grounding with Deep Semantic Clustering Jan 14, 2022 Clustering Sentence
— Unverified 0VideoGEM: Training-free Action Grounding in Videos Mar 26, 2025 Video Grounding
— Unverified 0Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding Dec 31, 2023 Spatio-Temporal Video Grounding Video Grounding
— Unverified 0VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding Jan 1, 2024 Spatio-Temporal Video Grounding Video Grounding
— Unverified 0VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Jul 17, 2025 Video Grounding Video Understanding
— Unverified 0Video LLMs for Temporal Reasoning in Long Videos Dec 4, 2024 Action Segmentation Dense Video Captioning
— Unverified 0Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition May 7, 2024 Large Language Model Multimodal Large Language Model
— Unverified 0ViGT: Proposal-free Video Grounding with Learnable Token in Transformer Aug 11, 2023 Feature Correlation regression
— Unverified 0SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses Aug 3, 2024 Natural Language Queries Video Grounding
— Unverified 0Dense Video Object Captioning from Disjoint Supervision Jun 20, 2023 Object Sentence
Code Code Available 0A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge Nov 16, 2022 Action Localization Natural Language Queries
Code Code Available 0