Mapping Natural Language Commands to Web Elements
Panupong Pasupat, Tian-Shun Jiang, Evan Zheran Liu, Kelvin Guu, Percy Liang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/stanfordnlp/phrasenodeOfficialIn paperpytorch★ 0
- worksheets.codalab.org/worksheets/0x0097f249cd944284a81af331093c3579Officialnone★ 0
Abstract
The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., "click on the second article"), choose the correct element on the web page (e.g., a hyperlink or text box). We collected a dataset of over 50,000 commands that capture various phenomena such as functional references (e.g. "find who made this site"), relational reasoning (e.g. "article by john"), and visual reasoning (e.g. "top-most article"). We also implemented and analyzed three baseline models that capture different phenomena present in the dataset.