Referring Expression Comprehension

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 167 papers

Title	Date	Tasks	Status	Hype
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model	Apr 10, 2025	Language ModelingLanguage Modelling	CodeCode Available	9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Dec 13, 2024	Chart UnderstandingMixture-of-Experts	CodeCode Available	9
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning	Oct 14, 2023	Image ClassificationImage Description	CodeCode Available	7
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models	Mar 27, 2024	Image ClassificationImage Comprehension	CodeCode Available	7
Improved Baselines with Visual Instruction Tuning	Oct 5, 2023	Factual Inconsistency Detection in Chart CaptioningImage Classification	CodeCode Available	6
Visual Instruction Tuning	Apr 17, 2023	1 Image, 2*2 Stitching3D Question Answering (3D-QA)	CodeCode Available	6
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection	Mar 9, 2023	DecoderObject Detection	CodeCode Available	5
Efficient Multimodal Learning from Data-centric Perspective	Feb 18, 2024	Image ClassificationReferring Expression Comprehension	CodeCode Available	5
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V	Oct 17, 2023	Interactive SegmentationReferring Expression	CodeCode Available	4
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day	Jun 1, 2023	Image ClassificationInstruction Following	CodeCode Available	4
Towards Visual Grounding: A Survey	Dec 28, 2024	Phrase GroundingReferring Expression	CodeCode Available	3
Universal Instance Perception as Object Discovery and Retrieval	Mar 12, 2023	Described Object DetectionGeneralized Referring Expression Comprehension	CodeCode Available	3
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices	Dec 28, 2023	AutoMLCPU	CodeCode Available	3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	May 18, 2023	1 Image, 2*2 StitchiAction Classification	CodeCode Available	3
General Object Foundation Model for Images and Videos at Scale	Dec 14, 2023	Instance SegmentationLong-tail Video Object Segmentation	CodeCode Available	3
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models	May 29, 2025	Referring ExpressionReferring Expression Comprehension	CodeCode Available	2
Frontiers in Intelligent Colonoscopy	Oct 22, 2024	Image Captioning	CodeCode Available	2
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models	Jun 24, 2024	Referring ExpressionReferring Expression Comprehension	CodeCode Available	2
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding	Jan 1, 2021	Phrase GroundingQuestion Answering	CodeCode Available	2
Elysium: Exploring Object-level Perception in Videos via MLLM	Mar 25, 2024	ObjectObject Tracking	CodeCode Available	2
GREC: Generalized Referring Expression Comprehension	Aug 30, 2023	Generalized Referring Expression ComprehensionReferring Expression	CodeCode Available	2
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion	Sep 26, 2024	DescriptiveGeneralized Referring Expression Comprehension	CodeCode Available	2
InstructDET: Diversifying Referring Object Detection with Generalized Instructions	Oct 8, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations	Jun 30, 2022	Language ModelingLanguage Modelling	CodeCode Available	1
Kosmos-2: Grounding Multimodal Large Language Models to the World	Jun 26, 2023	Image CaptioningIn-Context Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 7Next →

No leaderboard results yet.