SOTAVerified

Natural Language Visual Grounding

Papers

Showing 2130 of 32 papers

TitleStatusHype
ALFWorld: Aligning Text and Embodied Environments for Interactive LearningCode1
A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial ExpressionsCode1
Learning Cross-modal Context Graph for Visual GroundingCode1
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday TasksCode1
Self-Monitoring Navigation Agent via Auxiliary Progress EstimationCode1
Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences0
Composing Pick-and-Place Tasks By Grounding LanguageCode0
Searching for Ambiguous Objects in Videos using Relational Referring ExpressionsCode0
Modularized Textual Grounding for Counterfactual ResilienceCode0
Robust Change CaptioningCode0
Show:102550
← PrevPage 3 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UGround-V1-7BAccuracy (%)86.34Unverified
2Aguvis-7BAccuracy (%)83Unverified
3OS-Atlas-Base-7BAccuracy (%)82.47Unverified
4Aria-UIAccuracy (%)81.1Unverified
5Aguvis-G-7BAccuracy (%)81Unverified
6UGround-V1-2BAccuracy (%)77.67Unverified
7ShowUIAccuracy (%)75.1Unverified
8ShowUI-GAccuracy (%)75Unverified
9UGroundAccuracy (%)73.3Unverified
10OmniParserAccuracy (%)73Unverified