SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 151200 of 364 papers

TitleStatusHype
M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension0
Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO0
ScanFormer: Referring Expression Comprehension by Iteratively Scanning0
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane0
Bring Adaptive Binding Prototypes to Generalized Referring Expression SegmentationCode0
Adversarial Robustness for Visual Grounding of Multimodal Large Language ModelsCode0
Transcrib3D: 3D Referring Expression Resolution through Large Language Models0
Resilience through Scene Context in Visual Referring Expression GenerationCode0
Text-driven Affordance Learning from Egocentric Vision0
SUGAR: Pre-training 3D Visual Representations for Robotics0
PropTest: Automatic Property Testing for Improved Visual Programming0
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar0
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
Intrinsic Task-based Evaluation for Referring Expression Generation0
RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner0
Generalizable Entity Grounding via Assistance of Large Language Model0
Viewpoint-Aware Visual Grounding in 3D Scenes0
Revisiting Counterfactual Problems in Referring Expression ComprehensionCode0
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction0
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsCode0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence GenerationCode0
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language ModelsCode0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
Towards Omni-supervised Referring Expression SegmentationCode0
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
Multi-modal Domain Adaptation for REG via Relation Transfer0
CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
Whether you can locate or not? Interactive Referring Expression GenerationCode0
'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational ExchangesCode0
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks0
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
Referring Expression Comprehension Using Language Adaptive InferenceCode0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
DisCLIP: Open-Vocabulary Referring Expression Generation0
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving0
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from ExamplesCode0
Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive TeachersCode0
Meta Compositional Referring Expression Segmentation0
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression ComprehensionCode0
Dynamic Inference With Grounding Based Vision and Language Models0
Learning To Segment Every Referring Object Point by PointCode0
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension0
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension0
Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning0
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation0
Show:102550
← PrevPage 4 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified