Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 364 papers

Title	Date	Tasks	Status	Hype
GSVA: Generalized Segmentation via Multimodal Large Language Models	Dec 15, 2023	DecoderGeneralized Referring Expression Segmentation	CodeCode Available	1
Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation	Dec 13, 2023	DescriptiveObject	CodeCode Available	1
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions	Nov 28, 2023	DisentanglementReferring Expression	CodeCode Available	1
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs	Nov 8, 2023	Question AnsweringReferring Expression	CodeCode Available	1
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs	Oct 1, 2023	Referring Expression	CodeCode Available	1
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation	Aug 31, 2023	NavigateReferring Expression	CodeCode Available	1
A Unified Framework for 3D Point Cloud Visual Grounding	Aug 23, 2023	CPUGPU	CodeCode Available	1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D	Aug 23, 2023	ObjectObject Tracking	CodeCode Available	1
March in Chat: Interactive Prompting for Remote Embodied Referring Expression	Aug 20, 2023	Referring ExpressionVision and Language Navigation	CodeCode Available	1
Described Object Detection: Liberating Object Detection with Flexible Expressions	Jul 24, 2023	Binary ClassificationDescribed Object Detection	CodeCode Available	1
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation	Jul 3, 2023	Image SegmentationReferring Expression	CodeCode Available	1
Kosmos-2: Grounding Multimodal Large Language Models to the World	Jun 26, 2023	Image CaptioningIn-Context Learning	CodeCode Available	1
Advancing Referring Expression Segmentation Beyond Single Image	May 21, 2023	Co-Salient Object DetectionObject	CodeCode Available	1
Zero-shot Referring Image Segmentation with Global-Local Context Features	Mar 31, 2023	Image SegmentationReferring Expression	CodeCode Available	1
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations	Mar 23, 2023	Question AnsweringReferring Expression	CodeCode Available	1
Layout-aware Dreamer for Embodied Referring Expression Grounding	Nov 30, 2022	Common Sense ReasoningNavigate	CodeCode Available	1
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation	Oct 19, 2022	Instance SegmentationReferring Expression	CodeCode Available	1
SQA3D: Situated Question Answering in 3D Scenes	Oct 14, 2022	Question AnsweringReferring Expression	CodeCode Available	1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment	Oct 9, 2022	object-detectionObject Detection	CodeCode Available	1
Learning to Evaluate Performance of Multi-modal Semantic Localization	Sep 14, 2022	Cross-Modal RetrievalReferring Expression	CodeCode Available	1
Correspondence Matters for Video Referring Expression Comprehension	Jul 21, 2022	Contrastive LearningReferring Expression	CodeCode Available	1
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations	Jun 30, 2022	Language ModelingLanguage Modelling	CodeCode Available	1
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models	May 23, 2022	Language ModelingLanguage Modelling	CodeCode Available	1
GRIT: General Robust Image Task Benchmark	Apr 28, 2022	Instance SegmentationKeypoint Detection	CodeCode Available	1
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension	Apr 17, 2022	Data AugmentationReferring Expression	CodeCode Available	1
The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts	Apr 12, 2022	Referring Expression	CodeCode Available	1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension	Apr 12, 2022	image-classificationImage Classification	CodeCode Available	1
SeqTR: A Simple yet Universal Network for Visual Grounding	Mar 30, 2022	DecoderReferring Expression	CodeCode Available	1
Image Segmentation Using Text and Image Prompts	Dec 18, 2021	DecoderImage Segmentation	CodeCode Available	1
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation	Dec 4, 2021	DecoderGeneralized Referring Expression Segmentation	CodeCode Available	1
Airbert: In-domain Pretraining for Vision-and-Language Navigation	Aug 20, 2021	NavigateReferring Expression	CodeCode Available	1
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression	Jun 19, 2021	Instruction FollowingNavigate	CodeCode Available	1
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding	Jun 8, 2021	Referring ExpressionSentence	CodeCode Available	1
Referring Transformer: A One-step Approach to Multi-task Visual Grounding	Jun 6, 2021	DecoderReferring Expression	CodeCode Available	1
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding	Apr 26, 2021	Generalized Referring Expression ComprehensionPhrase Grounding	CodeCode Available	1
OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding	Mar 13, 2021	Referring ExpressionReferring Expression Segmentation	CodeCode Available	1
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning	Mar 9, 2021	Deep Reinforcement LearningReferring Expression	CodeCode Available	1
Unifying Vision-and-Language Tasks via Text Generation	Feb 4, 2021	Conditional Text GenerationDecoder	CodeCode Available	1
TRAR: Routing the Attention Spans in Transformer for Visual Question Answering	Jan 1, 2021	Question AnsweringReferring Expression	CodeCode Available	1
A Recurrent Vision-and-Language BERT for Navigation	Nov 26, 2020	Decision MakingDecoder	CodeCode Available	1
Human-centric Spatio-Temporal Video Grounding With Visual Transformers	Nov 10, 2020	Referring ExpressionSentence	CodeCode Available	1
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding	Sep 3, 2020	Referring ExpressionVocal Bursts Valence Prediction	CodeCode Available	1
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark	Aug 1, 2020	ObjectOne-shot visual object segmentation	CodeCode Available	1
Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies	Jul 31, 2020	Head DetectionReferring Expression	CodeCode Available	1
Refer360^: A Referring Expression Recognition Dataset in 360^ Images	Jul 1, 2020	Referring Expression	CodeCode Available	1
Large-Scale Adversarial Training for Vision-and-Language Representation Learning	Jun 11, 2020	Image-text RetrievalQuestion Answering	CodeCode Available	1
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions	May 4, 2020	Contrastive LearningMulti-Task Learning	CodeCode Available	1
Graph-Structured Referring Expression Reasoning in The Wild	Apr 19, 2020	Referring Expression	CodeCode Available	1
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation	Mar 19, 2020	Generalized Referring Expression ComprehensionReferring Expression	CodeCode Available	1
UNITER: UNiversal Image-TExt Representation Learning	Sep 25, 2019	Image-text matchingImage-text Retrieval	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 8Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Random	[email protected]	14.6	—	Unverified