SOTAVerified

Referring Expression Comprehension

Papers

Showing 150 of 167 papers

TitleStatusHype
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language ModelCode9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
Improved Baselines with Visual Instruction TuningCode6
Visual Instruction TuningCode6
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionCode5
Efficient Multimodal Learning from Data-centric PerspectiveCode5
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayCode4
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile DevicesCode3
General Object Foundation Model for Images and Videos at ScaleCode3
Towards Visual Grounding: A SurveyCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal ModelsCode2
Frontiers in Intelligent ColonoscopyCode2
Elysium: Exploring Object-level Perception in Videos via MLLMCode2
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal FusionCode2
MDETR - Modulated Detection for End-to-End Multi-Modal UnderstandingCode2
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text ModelsCode2
GREC: Generalized Referring Expression ComprehensionCode2
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Described Object Detection: Liberating Object Detection with Flexible ExpressionsCode1
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point CloudsCode1
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLMCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and GroundingCode1
PolyFormer: Referring Image Segmentation as Sequential Polygon GenerationCode1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
SeqTR: A Simple yet Universal Network for Visual GroundingCode1
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
NS3D: Neuro-Symbolic Grounding of 3D Objects and RelationsCode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
Multi-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
A Fast and Accurate One-Stage Approach to Visual GroundingCode1
Multi-branch Collaborative Learning Network for 3D Visual GroundingCode1
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsCode1
A Unified Framework for 3D Point Cloud Visual GroundingCode1
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEsCode1
Correspondence Matters for Video Referring Expression ComprehensionCode1
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity RecognitionCode1
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
Explainable Neural Computation via Stack Neural Module NetworksCode1
Improving Visual Grounding by Encouraging Consistent Gradient-based ExplanationsCode1
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.