SOTAVerified

Video-to-image Affordance Grounding

Given a demonstration video V and a target image I, the goal of video-to-image affordance grounding predict an affordance heatmap over the target image according to the hand-interacted region in the video, accompanied by the affordance action (e.g., press, turn).

Papers

Showing 14 of 4 papers

TitleStatusHype
Affordance Grounding from Demonstration Video to Target ImageCode1
Demo2Vec: Reasoning Object Affordances From Online Videos0
Learning Visual Affordance Grounding from Demonstration Videos0
Grounded Human-Object Interaction Hotspots from VideoCode0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1HotspotKLD1.47Unverified
2HAG-Net (+Hand Box)KLD1.41Unverified
3Demo2VecKLD1.2Unverified
4AfformerKLD1.05Unverified
#ModelMetricClaimedVerifiedStatus
1HotspotKLD1.26Unverified
2HAG-Net (+Hand Box)KLD1.21Unverified
3AfformerKLD0.97Unverified
#ModelMetricClaimedVerifiedStatus
1Demo2VecKLD2.34Unverified
2Afformer (ResNet-50-FPN encoder)KLD1.55Unverified
3Afformer (ViTDet-B encoder)KLD1.51Unverified