Improved GUI Grounding via Iterative Narrowing

2024-11-18Code Available1· sign in to hype

Anthony Nguyen

Code Available — Be the first to reproduce this paper.

Code

github.com/ant-8/GUI-Grounding-via-Iterative-Narrowing
OfficialIn paperpytorch★ 10

Abstract

Graphical User Interface (GUI) grounding plays a crucial role in enhancing the capabilities of Vision-Language Model (VLM) agents. While general VLMs, such as GPT-4V, demonstrate strong performance across various tasks, their proficiency in GUI grounding remains suboptimal. Recent studies have focused on fine-tuning these models specifically for zero-shot GUI grounding, yielding significant improvements over baseline performance. We introduce a visual prompting framework that employs an iterative narrowing mechanism to further improve the performance of both general and fine-tuned models in GUI grounding. For evaluation, we tested our method on a comprehensive benchmark comprising various UI platforms and provided the code to reproduce our results.

Tasks

Language Modeling Language Modelling Natural Language Visual Grounding Visual Prompting

Improved GUI Grounding via Iterative Narrowing

Code

Abstract

Tasks

Reproductions