Referring Expression Comprehension

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 167 papers

Title	Date	Tasks	Status	Hype
Tune-An-Ellipse: CLIP Has Potential to Find What You Want	Jan 1, 2024	ObjectReferring Expression	CodeCode Available	1
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices	Dec 28, 2023	AutoMLCPU	CodeCode Available	3
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction	Dec 21, 2023	16kAttribute	—Unverified	0
General Object Foundation Model for Images and Videos at Scale	Dec 14, 2023	Instance SegmentationLong-tail Video Object Segmentation	CodeCode Available	3
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects	Dec 8, 2023	Image Captioningobject-detection	—Unverified	0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection	Dec 4, 2023	Image to textobject-detection	—Unverified	0
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions	Nov 28, 2023	DisentanglementReferring Expression	CodeCode Available	1
Continual Referring Expression Comprehension via Dual Modular Memorization	Nov 25, 2023	MemorizationReferring Expression	CodeCode Available	0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models	Nov 24, 2023	AllReferring Expression	—Unverified	0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models	Nov 21, 2023	Image SegmentationLanguage Modelling	CodeCode Available	0
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs	Nov 8, 2023	Question AnsweringReferring Expression	CodeCode Available	1
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding	Nov 6, 2023	CoLAQuestion Answering	—Unverified	0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query	Oct 25, 2023	cross-modal alignmentReferring Expression	—Unverified	0
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V	Oct 17, 2023	Interactive SegmentationReferring Expression	CodeCode Available	4
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning	Oct 14, 2023	Image ClassificationImage Description	CodeCode Available	7
InstructDET: Diversifying Referring Object Detection with Generalized Instructions	Oct 8, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Improved Baselines with Visual Instruction Tuning	Oct 5, 2023	Factual Inconsistency Detection in Chart CaptioningImage Classification	CodeCode Available	6
Collecting Visually-Grounded Dialogue with A Game Of Sorts	Sep 10, 2023	Coreference ResolutionImage Retrieval	CodeCode Available	0
GREC: Generalized Referring Expression Comprehension	Aug 30, 2023	Generalized Referring Expression ComprehensionReferring Expression	CodeCode Available	2
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks	Aug 24, 2023	Language ModelingLanguage Modelling	CodeCode Available	0
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D	Aug 23, 2023	ObjectObject Tracking	CodeCode Available	1
A Unified Framework for 3D Point Cloud Visual Grounding	Aug 23, 2023	CPUGPU	CodeCode Available	1
Whether you can locate or not? Interactive Referring Expression Generation	Aug 19, 2023	Referring ExpressionReferring Expression Comprehension	CodeCode Available	0
Described Object Detection: Liberating Object Detection with Flexible Expressions	Jul 24, 2023	Binary ClassificationDescribed Object Detection	CodeCode Available	1
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks	Jul 14, 2023	ObjectReferring Expression	—Unverified	0
Kosmos-2: Grounding Multimodal Large Language Models to the World	Jun 26, 2023	Image CaptioningIn-Context Learning	CodeCode Available	1
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input	Jun 25, 2023	DiversityImage-text Retrieval	—Unverified	0
Language Adaptive Weight Generation for Multi-task Visual Grounding	Jun 6, 2023	Referring ExpressionReferring Expression Comprehension	CodeCode Available	0
Referring Expression Comprehension Using Language Adaptive Inference	Jun 6, 2023	object-detectionObject Detection	CodeCode Available	0
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day	Jun 1, 2023	Image ClassificationInstruction Following	CodeCode Available	4
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving	May 25, 2023	3D Object DetectionAutonomous Driving	—Unverified	0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	May 18, 2023	1 Image, 2*2 StitchiAction Classification	CodeCode Available	3
Visual Instruction Tuning	Apr 17, 2023	1 Image, 2*2 Stitching3D Question Answering (3D-QA)	CodeCode Available	6
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations	Mar 23, 2023	Question AnsweringReferring Expression	CodeCode Available	1
Universal Instance Perception as Object Discovery and Retrieval	Mar 12, 2023	Described Object DetectionGeneralized Referring Expression Comprehension	CodeCode Available	3
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection	Mar 9, 2023	DecoderObject Detection	CodeCode Available	5
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension	Feb 17, 2023	Referring ExpressionReferring Expression Comprehension	CodeCode Available	0
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation	Feb 14, 2023	DecoderImage Segmentation	CodeCode Available	1
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension	Jan 1, 2023	Imitation LearningPseudo Label	—Unverified	0
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension	Jan 1, 2023	Referring ExpressionReferring Expression Comprehension	—Unverified	0
Dynamic Inference With Grounding Based Vision and Language Models	Jan 1, 2023	Language ModellingReferring Expression	—Unverified	0
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding	Nov 28, 2022	object-detectionObject Detection	CodeCode Available	1
Scene-Text Oriented Reffering Expression Comprehension	Nov 4, 2022	Object LocalizationReferring Expression	CodeCode Available	0
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation	Oct 19, 2022	Instance SegmentationReferring Expression	CodeCode Available	1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment	Oct 9, 2022	object-detectionObject Detection	CodeCode Available	1
Video Referring Expression Comprehension via Transformer with Content-aware Query	Oct 6, 2022	cross-modal alignmentReferring Expression	—Unverified	0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos	Sep 21, 2022	Action DetectionAction Recognition	CodeCode Available	0
Learning to Evaluate Performance of Multi-modal Semantic Localization	Sep 14, 2022	Cross-Modal RetrievalReferring Expression	CodeCode Available	1
One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning	Jul 31, 2022	AllReferring Expression	—Unverified	0
Correspondence Matters for Video Referring Expression Comprehension	Jul 21, 2022	Contrastive LearningReferring Expression	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 4Next →

No leaderboard results yet.