SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 51100 of 132 papers

TitleStatusHype
Video Moment Retrieval from Text Queries via Single Frame AnnotationCode1
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight DetectionCode1
Deconfounded Video Moment Retrieval with Causal InterventionCode1
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment RetrievalCode1
Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment RetrievalCode0
Anchor-Aware Similarity Cohesion in Target Frames Enables Predicting Temporal Moment Boundaries in 2DCode0
Boundary-Denoising for Video Activity LocalizationCode0
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in VideosCode0
DTOS: Dynamic Time Object Sensing with Large Multimodal ModelCode0
Exploring Temporal Concurrency for Video-Language Representation LearningCode0
Going for GOAL: A Resource for Grounded Football CommentariesCode0
Improving Video Corpus Moment Retrieval with Partial Relevance EnhancementCode0
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics DomainsCode0
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment RetrievalCode0
Modal-specific Pseudo Query Generation for Video Corpus Moment RetrievalCode0
Moment of Untruth: Dealing with Negative Queries in Video Moment RetrievalCode0
MVMR: A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple DistractorsCode0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal GroundingCode0
Show and Guide: Instructional-Plan Grounded Vision and Language ModelCode0
SimVTP: Simple Video Text Pre-training with Masked AutoencodersCode0
Towards Diverse Temporal Grounding under Single Positive LabelsCode0
TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise QueriesCode0
UnLoc: A Unified Framework for Video Localization TasksCode0
Weakly Supervised Video Moment Retrieval From Text QueriesCode0
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval0
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment0
A Survey on Video Moment Localization0
Agent-based Video Trimming0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
Retrieval Augmented Generation Evaluation for Health Documents0
Language Guided Networks for Cross-modal Moment Retrieval0
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval0
Interactive Video Corpus Moment Retrieval using Reinforcement Learning0
Zero-shot Video Moment Retrieval With Off-the-Shelf Models0
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network0
wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL0
SLVideo: A Sign Language Video Moment Retrieval Framework0
Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels0
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection0
Temporal Perceiving Video-Language Pre-training0
Text-based Localization of Moments in a Video Corpus0
The Devil is in the Spurious Correlation: Boosting Moment Retrieval via Temporal Dynamic Learning0
Temporal Sentence Grounding in Videos: A Survey and Future Directions0
Graph Neural Network for Video Relocalization0
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features0
Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval0
Generating Adjacency Matrix for Video Relocalization0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified