SOTAVerified

Natural Language Visual Grounding

Papers

Showing 125 of 32 papers

TitleStatusHype
Aria-UI: Visual Grounding for GUI InstructionsCode3
Aguvis: Unified Pure Vision Agents for Autonomous GUI InteractionCode3
ShowUI: One Vision-Language-Action Model for GUI Visual AgentCode5
Improved GUI Grounding via Iterative NarrowingCode1
OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsCode3
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsCode3
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionCode11
OmniParser for Pure Vision Based GUI AgentCode12
GUICourse: From General Vision Language Models to Versatile GUI AgentsCode2
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language ModelsCode4
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsCode3
CogAgent: A Visual Language Model for GUI AgentsCode5
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Localizing Moments in Long Video Via Multimodal GuidanceCode1
Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences0
Belief Revision based Caption Re-ranker with Visual Semantic InformationCode1
TubeDETR: Spatio-Temporal Video Grounding with TransformersCode1
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation TasksCode1
Panoptic Narrative GroundingCode1
Composing Pick-and-Place Tasks By Grounding LanguageCode0
Panoptic Narrative GroundingCode1
ALFWorld: Aligning Text and Embodied Environments for Interactive LearningCode1
A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial ExpressionsCode1
Learning Cross-modal Context Graph for Visual GroundingCode1
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UGround-V1-7BAccuracy (%)86.34Unverified
2Aguvis-7BAccuracy (%)83Unverified
3OS-Atlas-Base-7BAccuracy (%)82.47Unverified
4Aria-UIAccuracy (%)81.1Unverified
5Aguvis-G-7BAccuracy (%)81Unverified
6UGround-V1-2BAccuracy (%)77.67Unverified
7ShowUIAccuracy (%)75.1Unverified
8ShowUI-GAccuracy (%)75Unverified
9UGroundAccuracy (%)73.3Unverified
10OmniParserAccuracy (%)73Unverified