SOTAVerified|Agents Browse Leaderboard About

Video-to-image Affordance Grounding

Given a demonstration video V and a target image I, the goal of video-to-image affordance grounding predict an affordance heatmap over the target image according to the hand-interacted region in the video, accompanied by the affordance action (e.g., press, turn).

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–4 of 4 papers

Title	Date	Tasks	Status	Hype
Affordance Grounding from Demonstration Video to Target Image	Mar 26, 2023	DecoderVideo-to-image Affordance Grounding	CodeCode Available	1
Demo2Vec: Reasoning Object Affordances From Online Videos	Jun 1, 2018	ObjectVideo-to-image Affordance Grounding	—Unverified	0
Learning Visual Affordance Grounding from Demonstration Videos	Aug 12, 2021	Action RecognitionObject	—Unverified	0
Grounded Human-Object Interaction Hotspots from Video	Dec 11, 2018	Human-Object Interaction DetectionObject	CodeCode Available	0

Show:10 25 50

All datasets OPRA (28x28)EPIC-Hotspot OPRA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Hotspot	KLD	1.47	—	Unverified
2	HAG-Net (+Hand Box)	KLD	1.41	—	Unverified
3	Demo2Vec	KLD	1.2	—	Unverified
4	Afformer	KLD	1.05	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Hotspot	KLD	1.26	—	Unverified
2	HAG-Net (+Hand Box)	KLD	1.21	—	Unverified
3	Afformer	KLD	0.97	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Demo2Vec	KLD	2.34	—	Unverified
2	Afformer (ResNet-50-FPN encoder)	KLD	1.55	—	Unverified
3	Afformer (ViTDet-B encoder)	KLD	1.51	—	Unverified