InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Mar 22, 2024 Action Classification Action Recognition
Code Code Available 7Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Jan 14, 2025 Embodied Question Answering Hallucination
Code Code Available 4SnAG: Scalable and Accurate Video Grounding Apr 2, 2024 Video Grounding Video Understanding
Code Code Available 4Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency Jun 2, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 2TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Mar 17, 2025 Video Grounding
Code Code Available 2LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Jan 14, 2025 Feature Compression Language Modeling
Code Code Available 2Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval Jul 21, 2024 General Knowledge Highlight Detection
Code Code Available 2Context-Guided Spatio-Temporal Video Grounding Jan 3, 2024 Object Spatio-Temporal Video Grounding
Code Code Available 2VTimeLLM: Empower LLM to Grasp Video Moments Nov 30, 2023 Dense Video Captioning Temporal Relation Extraction
Code Code Available 2PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Nov 22, 2023 Benchmarking Phrase Grounding
Code Code Available 2Query-Dependent Video Representation for Moment Retrieval and Highlight Detection Mar 24, 2023 Highlight Detection Moment Retrieval
Code Code Available 2UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection Mar 23, 2022 Decoder Highlight Detection
Code Code Available 2DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos May 22, 2025 Natural Language Moment Retrieval Natural Language Queries
Code Code Available 1Object-Shot Enhanced Grounding Network for Egocentric Video May 7, 2025 Video Grounding
Code Code Available 1OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding Mar 13, 2025 Object Video Grounding
Code Code Available 1TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos Mar 9, 2025 Action Localization Boundary Detection
Code Code Available 1Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding Feb 16, 2025 Attribute Object
Code Code Available 1VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning Jan 12, 2025 Dense Video Captioning Video Captioning
Code Code Available 1VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format Nov 27, 2024 Dense Video Captioning Grounded Video Question Answering
Code Code Available 1HawkEye: Training Video-Text LLMs for Grounding Text in Videos Mar 15, 2024 Video Grounding Video Question Answering
Code Code Available 1Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding Dec 27, 2023 Sentence Temporal Sentence Grounding
Code Code Available 1Grounded Question-Answering in Long Egocentric Videos Dec 11, 2023 Video Grounding Video Question Answering
Code Code Available 1Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection Nov 28, 2023 Contrastive Learning Highlight Detection
Code Code Available 1Can I Trust Your Answer? Visually Grounded Video Question Answering Sep 4, 2023 Grounded Video Question Answering Question Answering
Code Code Available 1Knowing Where to Focus: Event-aware Transformer for Video Grounding Aug 14, 2023 Moment Queries Sentence
Code Code Available 1Text-Visual Prompting for Efficient 2D Temporal Video Grounding Mar 9, 2023 Sentence Video Grounding
Code Code Available 1Localizing Moments in Long Video Via Multimodal Guidance Feb 26, 2023 Natural Language Moment Retrieval Natural Language Visual Grounding
Code Code Available 1Weakly-Supervised Temporal Article Grounding Oct 22, 2022 All Articles
Code Code Available 1Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding Sep 27, 2022 Decoder Spatio-Temporal Video Grounding
Code Code Available 1CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding Sep 22, 2022 Contrastive Learning Video Grounding
Code Code Available 1Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding Apr 18, 2022 Action Recognition Animal Action Recognition
Code Code Available 1TubeDETR: Spatio-Temporal Video Grounding with Transformers Mar 30, 2022 Decoder Language-Based Temporal Localization
Code Code Available 1Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos Jan 25, 2022 Natural Language Queries Sentence
Code Code Available 1Detecting Moments and Highlights in Videos via Natural Language Queries Dec 1, 2021 Decoder Moment Retrieval
Code Code Available 1Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding Sep 10, 2021 Metric Learning Representation Learning
Code Code Available 1VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer Jul 6, 2021 Image Retrieval Knowledge Distillation
Code Code Available 1VLG-Net: Video-Language Graph Matching Network for Video Grounding Nov 19, 2020 Graph Matching Moment Retrieval
Code Code Available 1Human-centric Spatio-Temporal Video Grounding With Visual Transformers Nov 10, 2020 Referring Expression Sentence
Code Code Available 1Dense Regression Network for Video Grounding Apr 7, 2020 Natural Language Moment Retrieval Natural Language Queries
Code Code Available 1Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences Jan 19, 2020 Form Object
Code Code Available 1VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Jul 17, 2025 Video Grounding Video Understanding
— Unverified 0SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models May 24, 2025 Benchmarking Video Grounding
— Unverified 0Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection Mar 29, 2025 Prediction Video Grounding
— Unverified 0VideoGEM: Training-free Action Grounding in Videos Mar 26, 2025 Video Grounding
— Unverified 0SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability Mar 18, 2025 Language Modeling Language Modelling
— Unverified 0Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding Jan 1, 2025 Action Understanding Spatio-Temporal Video Grounding
— Unverified 0Consistency of Compositional Generalization across Multiple Levels Dec 18, 2024 Meta-Learning Question Answering
Code Code Available 0Multi-Scale Contrastive Learning for Video Temporal Grounding Dec 10, 2024 Contrastive Learning Data Augmentation
— Unverified 0Video LLMs for Temporal Reasoning in Long Videos Dec 4, 2024 Action Segmentation Dense Video Captioning
— Unverified 0