SOTAVerified

Referring Expression Comprehension

Papers

Showing 101125 of 167 papers

TitleStatusHype
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasksCode0
Whether you can locate or not? Interactive Referring Expression GenerationCode0
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks0
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
Referring Expression Comprehension Using Language Adaptive InferenceCode0
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving0
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression ComprehensionCode0
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension0
Dynamic Inference With Grounding Based Vision and Language Models0
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension0
Scene-Text Oriented Reffering Expression ComprehensionCode0
Video Referring Expression Comprehension via Transformer with Content-aware Query0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning0
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks0
RefCrowd: Grounding the Target in Crowd with Referring Expressions0
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.