SOTAVerified

Referring Expression Comprehension

Papers

Showing 125 of 167 papers

TitleStatusHype
Referring Expression Instance Retrieval and A Strong End-to-End Baseline0
Synthetic Visual Genome0
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and SegmentationCode0
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language ModelCode9
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationCode1
Exploring Spatial Language Grounding Through Referring Expressions0
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis0
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsCode1
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension0
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding0
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension0
Towards Visual Grounding: A SurveyCode3
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension0
Frontiers in Intelligent ColonoscopyCode2
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal ModelsCode0
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal FusionCode2
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.