Referring Expression Comprehension

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 167 papers

Title	Date	Tasks	Status	Hype	Score
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model	Apr 10, 2025	Language ModelingLanguage Modelling	CodeCode Available	9	5
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Dec 13, 2024	Chart UnderstandingMixture-of-Experts	CodeCode Available	9	5
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models	Mar 27, 2024	Image ClassificationImage Comprehension	CodeCode Available	7	5
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning	Oct 14, 2023	Image ClassificationImage Description	CodeCode Available	7	5
Improved Baselines with Visual Instruction Tuning	Oct 5, 2023	Factual Inconsistency Detection in Chart CaptioningImage Classification	CodeCode Available	6	5
Visual Instruction Tuning	Apr 17, 2023	1 Image, 2*2 Stitching3D Question Answering (3D-QA)	CodeCode Available	6	5
Efficient Multimodal Learning from Data-centric Perspective	Feb 18, 2024	Image ClassificationReferring Expression Comprehension	CodeCode Available	5	5
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection	Mar 9, 2023	DecoderObject Detection	CodeCode Available	5	5
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day	Jun 1, 2023	Image ClassificationInstruction Following	CodeCode Available	4	5
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V	Oct 17, 2023	Interactive SegmentationReferring Expression	CodeCode Available	4	5
General Object Foundation Model for Images and Videos at Scale	Dec 14, 2023	Instance SegmentationLong-tail Video Object Segmentation	CodeCode Available	3	5
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	May 18, 2023	1 Image, 2*2 StitchiAction Classification	CodeCode Available	3	5
Towards Visual Grounding: A Survey	Dec 28, 2024	Phrase GroundingReferring Expression	CodeCode Available	3	5
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices	Dec 28, 2023	AutoMLCPU	CodeCode Available	3	5
Universal Instance Perception as Object Discovery and Retrieval	Mar 12, 2023	Described Object DetectionGeneralized Referring Expression Comprehension	CodeCode Available	3	5
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding	Jan 1, 2021	Phrase GroundingQuestion Answering	CodeCode Available	2	5
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models	Jun 24, 2024	Referring ExpressionReferring Expression Comprehension	CodeCode Available	2	5
Frontiers in Intelligent Colonoscopy	Oct 22, 2024	Image Captioning	CodeCode Available	2	5
GREC: Generalized Referring Expression Comprehension	Aug 30, 2023	Generalized Referring Expression ComprehensionReferring Expression	CodeCode Available	2	5
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models	May 29, 2025	Referring ExpressionReferring Expression Comprehension	CodeCode Available	2	5
Elysium: Exploring Object-level Perception in Videos via MLLM	Mar 25, 2024	ObjectObject Tracking	CodeCode Available	2	5
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion	Sep 26, 2024	DescriptiveGeneralized Referring Expression Comprehension	CodeCode Available	2	5
Referring Transformer: A One-step Approach to Multi-task Visual Grounding	Jun 6, 2021	DecoderReferring Expression	CodeCode Available	1	5
Described Object Detection: Liberating Object Detection with Flexible Expressions	Jul 24, 2023	Binary ClassificationDescribed Object Detection	CodeCode Available	1	5
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding	Apr 26, 2021	Generalized Referring Expression ComprehensionPhrase Grounding	CodeCode Available	1	5
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone	Jun 15, 2022	Described Object DetectionImage Captioning	CodeCode Available	1	5
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds	Dec 16, 2021	Objectobject-detection	CodeCode Available	1	5
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM	Mar 19, 2024	Objectobject-detection	CodeCode Available	1	5
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D	Aug 23, 2023	ObjectObject Tracking	CodeCode Available	1	5
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding	Nov 28, 2022	object-detectionObject Detection	CodeCode Available	1	5
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension	Sep 20, 2024	cross-modal alignmentReferring Expression	CodeCode Available	1	5
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension	Apr 12, 2022	image-classificationImage Classification	CodeCode Available	1	5
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone Scenes	Feb 1, 2025	Referring ExpressionReferring Expression Comprehension	CodeCode Available	1	5
SeqTR: A Simple yet Universal Network for Visual Grounding	Mar 30, 2022	DecoderReferring Expression	CodeCode Available	1	5
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations	Mar 23, 2023	Question AnsweringReferring Expression	CodeCode Available	1	5
Learning to Evaluate Performance of Multi-modal Semantic Localization	Sep 14, 2022	Cross-Modal RetrievalReferring Expression	CodeCode Available	1	5
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints	Jan 12, 2025	Image SegmentationReferring Expression	CodeCode Available	1	5
Large-Scale Adversarial Training for Vision-and-Language Representation Learning	Jun 11, 2020	Image-text RetrievalQuestion Answering	CodeCode Available	1	5
A Fast and Accurate One-Stage Approach to Visual Grounding	Aug 18, 2019	Referring ExpressionReferring Expression Comprehension	CodeCode Available	1	5
Multi-branch Collaborative Learning Network for 3D Visual Grounding	Jul 7, 2024	3D visual groundingReferring Expression	CodeCode Available	1	5
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension	Sep 23, 2024	Image ComprehensionReferring Expression	CodeCode Available	1	5
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation	Mar 19, 2020	Generalized Referring Expression ComprehensionReferring Expression	CodeCode Available	1	5
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration	Feb 27, 2025	Image ComprehensionReferring Expression	CodeCode Available	1	5
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models	May 23, 2022	Language ModelingLanguage Modelling	CodeCode Available	1	5
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs	Nov 8, 2023	Question AnsweringReferring Expression	CodeCode Available	1	5
Correspondence Matters for Video Referring Expression Comprehension	Jul 21, 2022	Contrastive LearningReferring Expression	CodeCode Available	1	5
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition	Feb 15, 2024	Grounded Multimodal Named Entity RecognitionMulti-modal Named Entity Recognition	CodeCode Available	1	5
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension	Sep 18, 2024	Referring ExpressionReferring Expression Comprehension	CodeCode Available	1	5
A Unified Framework for 3D Point Cloud Visual Grounding	Aug 23, 2023	CPUGPU	CodeCode Available	1	5
Explainable Neural Computation via Stack Neural Module Networks	Jul 23, 2018	Decision MakingQuestion Answering	CodeCode Available	1	5

Show:10 25 50

← PrevPage 1 of 4Next →

No leaderboard results yet.