SOTAVerified

Video Visual Relation Detection

Video Visual Relation Detection (VidVRD) aims to detect instances of visual relations of interest in a video, where a visual relation instance is represented by a relation triplet with the trajectories of the subject and object. As compared to still images, videos provide a more natural set of features for detecting visual relations, such as the dynamic relations like “A-follow-B” and “A-towards-B”, and temporally changing relations like “A-chase-B” followed by “A-hold-B”. Yet, VidVRD is technically more challenging than ImgVRD due to the difficulties in accurate object tracking and diverse relation appearances in the video domain.

Source: ImageNet-VidVRD Video Visual Relation Dataset

Papers

Showing 110 of 15 papers

TitleStatusHype
What and When to Look?: Temporal Span Proposal Network for Video Relation DetectionCode1
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation DetectionCode1
LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videosCode1
Spatial-Temporal Transformer for Dynamic Scene Graph GenerationCode1
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports VideosCode1
Video Relation Detection via Tracklet based Visual TransformerCode1
VrdONE: One-stage Video Visual Relation DetectionCode1
OpenVidVRD: Open-Vocabulary Video Visual Relation Detection via Prompt-Driven Semantic Space Alignment0
VRDFormer: End-to-End Video Visual Relation Detection With Transformers0
Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.