SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 101132 of 132 papers

TitleStatusHype
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training0
Video Moment Retrieval via Natural Language Queries0
Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network0
wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL0
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval0
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models0
Zero-shot Video Moment Retrieval With Off-the-Shelf Models0
Moment of Untruth: Dealing with Negative Queries in Video Moment RetrievalCode0
UnLoc: A Unified Framework for Video Localization TasksCode0
Modal-specific Pseudo Query Generation for Video Corpus Moment RetrievalCode0
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment RetrievalCode0
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics DomainsCode0
Improving Video Corpus Moment Retrieval with Partial Relevance EnhancementCode0
Going for GOAL: A Resource for Grounded Football CommentariesCode0
Boundary-Denoising for Video Activity LocalizationCode0
Weakly Supervised Video Moment Retrieval From Text QueriesCode0
Exploring Temporal Concurrency for Video-Language Representation LearningCode0
Anchor-Aware Similarity Cohesion in Target Frames Enables Predicting Temporal Moment Boundaries in 2DCode0
DTOS: Dynamic Time Object Sensing with Large Multimodal ModelCode0
SimVTP: Simple Video Text Pre-training with Masked AutoencodersCode0
Towards Diverse Temporal Grounding under Single Positive LabelsCode0
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in VideosCode0
Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment RetrievalCode0
Show and Guide: Instructional-Plan Grounded Vision and Language ModelCode0
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal GroundingCode0
TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise QueriesCode0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
MVMR: A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple DistractorsCode0
Show:102550
← PrevPage 3 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified