Improved GUI Grounding via Iterative Narrowing
Anthony Nguyen
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/ant-8/GUI-Grounding-via-Iterative-NarrowingOfficialIn paperpytorch★ 10
Abstract
Graphical User Interface (GUI) grounding plays a crucial role in enhancing the capabilities of Vision-Language Model (VLM) agents. While general VLMs, such as GPT-4V, demonstrate strong performance across various tasks, their proficiency in GUI grounding remains suboptimal. Recent studies have focused on fine-tuning these models specifically for zero-shot GUI grounding, yielding significant improvements over baseline performance. We introduce a visual prompting framework that employs an iterative narrowing mechanism to further improve the performance of both general and fine-tuned models in GUI grounding. For evaluation, we tested our method on a comprehensive benchmark comprising various UI platforms and provided the code to reproduce our results.