SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 5175 of 364 papers

TitleStatusHype
Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image SegmentationCode1
NS3D: Neuro-Symbolic Grounding of 3D Objects and RelationsCode1
Correspondence Matters for Video Referring Expression ComprehensionCode1
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsCode1
A Unified Framework for 3D Point Cloud Visual GroundingCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
Airbert: In-domain Pretraining for Vision-and-Language NavigationCode1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
Referring Atomic Video Action RecognitionCode1
Multi-branch Collaborative Learning Network for 3D Visual GroundingCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
LAVT: Language-Aware Vision Transformer for Referring Image SegmentationCode1
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement LearningCode1
3D-GRES: Generalized 3D Referring Expression SegmentationCode1
Human-centric Spatio-Temporal Video Grounding With Visual TransformersCode1
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word EmphasisCode1
Layout-aware Dreamer for Embodied Referring Expression GroundingCode1
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression GroundingCode1
GRIT: General Robust Image Task BenchmarkCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression SegmentationCode1
GSVA: Generalized Segmentation via Multimodal Large Language ModelsCode1
Show:102550
← PrevPage 3 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified