SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 51100 of 364 papers

TitleStatusHype
GSVA: Generalized Segmentation via Multimodal Large Language ModelsCode1
Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression SegmentationCode1
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEsCode1
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMsCode1
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression SegmentationCode1
A Unified Framework for 3D Point Cloud Visual GroundingCode1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
March in Chat: Interactive Prompting for Remote Embodied Referring ExpressionCode1
Described Object Detection: Liberating Object Detection with Flexible ExpressionsCode1
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object SegmentationCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
Advancing Referring Expression Segmentation Beyond Single ImageCode1
Zero-shot Referring Image Segmentation with Global-Local Context FeaturesCode1
NS3D: Neuro-Symbolic Grounding of 3D Objects and RelationsCode1
Layout-aware Dreamer for Embodied Referring Expression GroundingCode1
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationCode1
SQA3D: Situated Question Answering in 3D ScenesCode1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature AlignmentCode1
Learning to Evaluate Performance of Multi-modal Semantic LocalizationCode1
Correspondence Matters for Video Referring Expression ComprehensionCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsCode1
GRIT: General Robust Image Task BenchmarkCode1
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionCode1
The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary TextsCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
SeqTR: A Simple yet Universal Network for Visual GroundingCode1
Image Segmentation Using Text and Image PromptsCode1
LAVT: Language-Aware Vision Transformer for Referring Image SegmentationCode1
Airbert: In-domain Pretraining for Vision-and-Language NavigationCode1
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring ExpressionCode1
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression GroundingCode1
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene GroundingCode1
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement LearningCode1
Unifying Vision-and-Language Tasks via Text GenerationCode1
TRAR: Routing the Attention Spans in Transformer for Visual Question AnsweringCode1
A Recurrent Vision-and-Language BERT for NavigationCode1
Human-centric Spatio-Temporal Video Grounding With Visual TransformersCode1
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression GroundingCode1
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale BenchmarkCode1
Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studiesCode1
Refer360^: A Referring Expression Recognition Dataset in 360^ ImagesCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring ExpressionsCode1
Graph-Structured Referring Expression Reasoning in The WildCode1
Multi-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
Show:102550
← PrevPage 2 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified