SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 150 of 364 papers

TitleStatusHype
4th PVUW MeViS 3rd Place Report: Sa2VACode5
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionCode5
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
PSALM: Pixelwise SegmentAtion with Large Multi-Modal ModelCode3
RemoteSAM: Towards Segment Anything for Earth ObservationCode3
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything ModelCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
Towards Visual Grounding: A SurveyCode3
MDETR - Modulated Detection for End-to-End Multi-Modal UnderstandingCode2
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
GLaMM: Pixel Grounding Large Multimodal ModelCode2
GREC: Generalized Referring Expression ComprehensionCode2
GRES: Generalized Referring Expression SegmentationCode2
F-LMM: Grounding Frozen Large Multimodal ModelsCode2
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal ModelsCode2
NExT-Chat: An LMM for Chat, Detection and SegmentationCode2
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression SegmentationCode2
Text4Seg: Reimagining Image Segmentation as Text GenerationCode2
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression SegmentationCode2
Decoupling Static and Hierarchical Motion Perception for Referring Video SegmentationCode2
GroundingSuite: Measuring Complex Multi-Granular Pixel GroundingCode2
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
Learning to Evaluate Performance of Multi-modal Semantic LocalizationCode1
March in Chat: Interactive Prompting for Remote Embodied Referring ExpressionCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
A Unified Framework for 3D Point Cloud Visual GroundingCode1
Airbert: In-domain Pretraining for Vision-and-Language NavigationCode1
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity RecognitionCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
LAVT: Language-Aware Vision Transformer for Referring Image SegmentationCode1
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement LearningCode1
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression SegmentationCode1
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word EmphasisCode1
Layout-aware Dreamer for Embodied Referring Expression GroundingCode1
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
GSVA: Generalized Segmentation via Multimodal Large Language ModelsCode1
GRIT: General Robust Image Task BenchmarkCode1
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression SegmentationCode1
An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionCode1
Colors in Context: A Pragmatic Neural Model for Grounded Language UnderstandingCode1
Advancing Referring Expression Segmentation Beyond Single ImageCode1
Graph-Structured Referring Expression Reasoning in The WildCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Human-centric Spatio-Temporal Video Grounding With Visual TransformersCode1
A Recurrent Vision-and-Language BERT for NavigationCode1
Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image SegmentationCode1
A Fast and Accurate One-Stage Approach to Visual GroundingCode1
Exploring Contextual Attribute Density in Referring Expression CountingCode1
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified