SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 51100 of 364 papers

TitleStatusHype
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Grounding Language in Multi-Perspective Referential CommunicationCode0
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image SegmentationCode1
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension GuidingCode0
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression0
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression SegmentationCode2
A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection TrainingCode0
Revisiting Multi-Modal LLM Evaluation0
3D-GRES: Generalized 3D Referring Expression SegmentationCode1
MaskInversion: Localized Embeddings via Optimization of Explainability Maps0
Look Hear: Gaze Prediction for Speech-directed Human Attention0
Learning Visual Grounding from Generative Vision and Language Model0
Multi-branch Collaborative Learning Network for 3D Visual GroundingCode1
The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge0
Referring Atomic Video Action RecognitionCode1
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation0
M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension0
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything ModelCode3
Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO0
ScanFormer: Referring Expression Comprehension by Iteratively Scanning0
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal ModelsCode2
F-LMM: Grounding Frozen Large Multimodal ModelsCode2
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression SegmentationCode1
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane0
Bring Adaptive Binding Prototypes to Generalized Referring Expression SegmentationCode0
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression SegmentationCode1
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression ComprehensionCode1
Adversarial Robustness for Visual Grounding of Multimodal Large Language ModelsCode0
Transcrib3D: 3D Referring Expression Resolution through Large Language Models0
Resilience through Scene Context in Visual Referring Expression GenerationCode0
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
Text-driven Affordance Learning from Egocentric Vision0
SUGAR: Pre-training 3D Visual Representations for Robotics0
PropTest: Automatic Property Testing for Improved Visual Programming0
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
PSALM: Pixelwise SegmentAtion with Large Multi-Modal ModelCode3
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar0
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLMCode1
Multi-modal Instruction Tuned LLMs with Fine-grained Visual PerceptionCode1
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity RecognitionCode1
Intrinsic Task-based Evaluation for Referring Expression Generation0
RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner0
Generalizable Entity Grounding via Assistance of Large Language Model0
An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionCode1
Revisiting Counterfactual Problems in Referring Expression ComprehensionCode0
Show:102550
← PrevPage 2 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified