SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 51100 of 132 papers

TitleStatusHype
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Video Mamba Suite: State Space Model as a Versatile Alternative for Video UnderstandingCode3
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features0
Improving Video Corpus Moment Retrieval with Partial Relevance EnhancementCode0
Event-aware Video Corpus Moment Retrieval0
Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval0
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionCode2
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment RetrievalCode1
Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment RetrievalCode0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosCode1
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight DetectionCode1
Correlation-Guided Query-Dependency Calibration for Video Temporal GroundingCode2
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval0
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics DomainsCode0
DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection0
UnLoc: A Unified Framework for Video Localization TasksCode0
MVMR: A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple DistractorsCode0
UniVTG: Towards Unified Video-Language Temporal GroundingCode2
MomentDiff: Generative Video Moment Retrieval from Random to RealCode1
A Survey on Video Moment Localization0
Background-aware Moment Detection for Video Moment RetrievalCode1
Faster Video Moment Retrieval with Point-Level Supervision0
Joint Moment Retrieval and Highlight Detection Via Natural Language QueriesCode1
TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion SynthesisCode2
Boundary-Denoising for Video Activity LocalizationCode0
Hierarchical Video-Moment Retrieval and Step-CaptioningCode1
Query-Dependent Video Representation for Moment Retrieval and Highlight DetectionCode2
Towards Diverse Temporal Grounding under Single Positive LabelsCode0
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training0
Interactive Video Corpus Moment Retrieval using Reinforcement Learning0
Multi-video Moment Ranking with Multimodal Clue0
Temporal Perceiving Video-Language Pre-training0
Exploring Temporal Concurrency for Video-Language Representation LearningCode0
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active LearningCode1
SimVTP: Simple Video Text Pre-training with Masked AutoencodersCode0
Going for GOAL: A Resource for Grounded Football CommentariesCode0
Zero-shot Video Moment Retrieval With Off-the-Shelf Models0
FedVMR: A New Federated Learning method for Video Moment Retrieval0
Modal-specific Pseudo Query Generation for Video Corpus Moment RetrievalCode0
Selective Query-guided Debiasing for Video Corpus Moment RetrievalCode1
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval0
Partially Relevant Video RetrievalCode1
Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval0
You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in VideosCode1
Video Moment Retrieval from Text Queries via Single Frame AnnotationCode1
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval0
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionCode2
Temporal Sentence Grounding in Videos: A Survey and Future Directions0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified