SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 2650 of 364 papers

TitleStatusHype
LAVT: Language-Aware Vision Transformer for Referring Image SegmentationCode1
A Unified Framework for 3D Point Cloud Visual GroundingCode1
Airbert: In-domain Pretraining for Vision-and-Language NavigationCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEsCode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
Layout-aware Dreamer for Embodied Referring Expression GroundingCode1
A Fast and Accurate One-Stage Approach to Visual GroundingCode1
A Recurrent Vision-and-Language BERT for NavigationCode1
Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image SegmentationCode1
Learning to Evaluate Performance of Multi-modal Semantic LocalizationCode1
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement LearningCode1
Human-centric Spatio-Temporal Video Grounding With Visual TransformersCode1
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word EmphasisCode1
An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionCode1
Colors in Context: A Pragmatic Neural Model for Grounded Language UnderstandingCode1
Advancing Referring Expression Segmentation Beyond Single ImageCode1
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLMCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression SegmentationCode1
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression GroundingCode1
Exploring Contextual Attribute Density in Referring Expression CountingCode1
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression SegmentationCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
Show:102550
← PrevPage 2 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified