SOTAVerified

Video Narrative Grounding

Video Narrative Grounding is the task of linking video narratives to specific video segments. The input is a video with a text description (the narrative) and the positions of certain nouns marked. For each marked noun, the method must output a segmentation mask for the object it refers to, in each video frame.

Source: Connecting Vision and Language with Video Localized Narratives

Papers

Showing 13 of 3 papers

TitleStatusHype
Connecting Vision and Language with Video Localized NarrativesCode1
Point-VOS: Pointing Up Video Object Segmentation0
Transformer with Controlled Attention for Synchronous Motion CaptioningCode0
Show:102550

No leaderboard results yet.