SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 151175 of 364 papers

TitleStatusHype
M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension0
Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO0
ScanFormer: Referring Expression Comprehension by Iteratively Scanning0
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane0
Bring Adaptive Binding Prototypes to Generalized Referring Expression SegmentationCode0
Adversarial Robustness for Visual Grounding of Multimodal Large Language ModelsCode0
Transcrib3D: 3D Referring Expression Resolution through Large Language Models0
Resilience through Scene Context in Visual Referring Expression GenerationCode0
Text-driven Affordance Learning from Egocentric Vision0
SUGAR: Pre-training 3D Visual Representations for Robotics0
PropTest: Automatic Property Testing for Improved Visual Programming0
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar0
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
Intrinsic Task-based Evaluation for Referring Expression Generation0
RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner0
Generalizable Entity Grounding via Assistance of Large Language Model0
Revisiting Counterfactual Problems in Referring Expression ComprehensionCode0
Viewpoint-Aware Visual Grounding in 3D Scenes0
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction0
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsCode0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence GenerationCode0
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
Show:102550
← PrevPage 7 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified