SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 126150 of 364 papers

TitleStatusHype
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
Cognitive Disentanglement for Referring Multi-Object Tracking0
Exploring Spatial Language Grounding Through Referring Expressions0
Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities0
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis0
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension0
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding0
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension0
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension0
Instance-Aware Generalized Referring Expression Segmentation0
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation0
SegLLM: Multi-round Reasoning Segmentation0
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal ModelsCode0
Grounding Language in Multi-Perspective Referential CommunicationCode0
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension GuidingCode0
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression0
A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection TrainingCode0
Revisiting Multi-Modal LLM Evaluation0
MaskInversion: Localized Embeddings via Optimization of Explainability Maps0
Look Hear: Gaze Prediction for Speech-directed Human Attention0
Learning Visual Grounding from Generative Vision and Language Model0
The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge0
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation0
Show:102550
← PrevPage 6 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified