SOTAVerified

Referring Expression Comprehension

Papers

Showing 110 of 167 papers

TitleStatusHype
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language ModelCode9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
Improved Baselines with Visual Instruction TuningCode6
Visual Instruction TuningCode6
Efficient Multimodal Learning from Data-centric PerspectiveCode5
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionCode5
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayCode4
Show:102550
← PrevPage 1 of 17Next →

No leaderboard results yet.