SOTAVerified|Agents Browse Leaderboard About Blog

Natural Language Visual Grounding

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 32 papers

Title	Date	Tasks	Status	Hype
Aria-UI: Visual Grounding for GUI Instructions	Dec 20, 2024	Natural Language Visual GroundingVisual Grounding	CodeCode Available	3
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction	Dec 5, 2024	Multimodal ReasoningNatural Language Visual Grounding	CodeCode Available	3
ShowUI: One Vision-Language-Action Model for GUI Visual Agent	Nov 26, 2024	Instruction FollowingNatural Language Visual Grounding	CodeCode Available	5
Improved GUI Grounding via Iterative Narrowing	Nov 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents	Oct 30, 2024	Natural Language Visual Grounding	CodeCode Available	3
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents	Oct 7, 2024	Natural Language Visual GroundingNavigate	CodeCode Available	3
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	Sep 18, 2024	Natural Language Visual Grounding	CodeCode Available	11
OmniParser for Pure Vision Based GUI Agent	Aug 1, 2024	Natural Language Visual Grounding	CodeCode Available	12
GUICourse: From General Vision Language Models to Versatile GUI Agents	Jun 17, 2024	Natural Language Visual GroundingOptical Character Recognition (OCR)	CodeCode Available	2
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models	Apr 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	4

Show:10 25 50

← PrevPage 1 of 4Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UGround-V1-7B	Accuracy (%)	86.34	—	Unverified
2	Aguvis-7B	Accuracy (%)	83	—	Unverified
3	OS-Atlas-Base-7B	Accuracy (%)	82.47	—	Unverified
4	Aria-UI	Accuracy (%)	81.1	—	Unverified
5	Aguvis-G-7B	Accuracy (%)	81	—	Unverified
6	UGround-V1-2B	Accuracy (%)	77.67	—	Unverified
7	ShowUI	Accuracy (%)	75.1	—	Unverified
8	ShowUI-G	Accuracy (%)	75	—	Unverified
9	UGround	Accuracy (%)	73.3	—	Unverified
10	OmniParser	Accuracy (%)	73	—	Unverified