SOTAVerified

Referring Expression Comprehension

Papers

Showing 150 of 167 papers

TitleStatusHype
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language ModelCode9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
Improved Baselines with Visual Instruction TuningCode6
Visual Instruction TuningCode6
Efficient Multimodal Learning from Data-centric PerspectiveCode5
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionCode5
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayCode4
Towards Visual Grounding: A SurveyCode3
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile DevicesCode3
General Object Foundation Model for Images and Videos at ScaleCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
Frontiers in Intelligent ColonoscopyCode2
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal FusionCode2
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal ModelsCode2
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
GREC: Generalized Referring Expression ComprehensionCode2
MDETR - Modulated Detection for End-to-End Multi-Modal UnderstandingCode2
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationCode1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsCode1
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
Multi-branch Collaborative Learning Network for 3D Visual GroundingCode1
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression ComprehensionCode1
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLMCode1
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity RecognitionCode1
An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEsCode1
InstructDET: Diversifying Referring Object Detection with Generalized InstructionsCode1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
A Unified Framework for 3D Point Cloud Visual GroundingCode1
Described Object Detection: Liberating Object Detection with Flexible ExpressionsCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
NS3D: Neuro-Symbolic Grounding of 3D Objects and RelationsCode1
PolyFormer: Referring Image Segmentation as Sequential Polygon GenerationCode1
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and GroundingCode1
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationCode1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature AlignmentCode1
Learning to Evaluate Performance of Multi-modal Semantic LocalizationCode1
Correspondence Matters for Video Referring Expression ComprehensionCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.