SOTAVerified

Referring Expression Comprehension

Papers

Showing 125 of 167 papers

TitleStatusHype
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language ModelCode9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
Improved Baselines with Visual Instruction TuningCode6
Visual Instruction TuningCode6
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionCode5
Efficient Multimodal Learning from Data-centric PerspectiveCode5
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayCode4
Towards Visual Grounding: A SurveyCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile DevicesCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
General Object Foundation Model for Images and Videos at ScaleCode3
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
Frontiers in Intelligent ColonoscopyCode2
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal ModelsCode2
MDETR - Modulated Detection for End-to-End Multi-Modal UnderstandingCode2
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
GREC: Generalized Referring Expression ComprehensionCode2
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal FusionCode2
InstructDET: Diversifying Referring Object Detection with Generalized InstructionsCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.