Video-to-image Affordance Grounding
Given a demonstration video V and a target image I, the goal of video-to-image affordance grounding predict an affordance heatmap over the target image according to the hand-interacted region in the video, accompanied by the affordance action (e.g., press, turn).
Papers
Showing 1–4 of 4 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Hotspot | KLD | 1.26 | — | Unverified |
| 2 | HAG-Net (+Hand Box) | KLD | 1.21 | — | Unverified |
| 3 | Afformer | KLD | 0.97 | — | Unverified |