SOTAVerified

Referring Expression Comprehension

Papers

Showing 51100 of 167 papers

TitleStatusHype
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile DevicesCode3
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction0
General Object Foundation Model for Images and Videos at ScaleCode3
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language ModelsCode0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEsCode1
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
InstructDET: Diversifying Referring Object Detection with Generalized InstructionsCode1
Improved Baselines with Visual Instruction TuningCode6
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
GREC: Generalized Referring Expression ComprehensionCode2
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasksCode0
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
A Unified Framework for 3D Point Cloud Visual GroundingCode1
Whether you can locate or not? Interactive Referring Expression GenerationCode0
Described Object Detection: Liberating Object Detection with Flexible ExpressionsCode1
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks0
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
Referring Expression Comprehension Using Language Adaptive InferenceCode0
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayCode4
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Visual Instruction TuningCode6
NS3D: Neuro-Symbolic Grounding of 3D Objects and RelationsCode1
Universal Instance Perception as Object Discovery and RetrievalCode3
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionCode5
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression ComprehensionCode0
PolyFormer: Referring Image Segmentation as Sequential Polygon GenerationCode1
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension0
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension0
Dynamic Inference With Grounding Based Vision and Language Models0
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and GroundingCode1
Scene-Text Oriented Reffering Expression ComprehensionCode0
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationCode1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature AlignmentCode1
Video Referring Expression Comprehension via Transformer with Content-aware Query0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
Learning to Evaluate Performance of Multi-modal Semantic LocalizationCode1
One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning0
Correspondence Matters for Video Referring Expression ComprehensionCode1
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.