SOTAVerified

Referring Expression Comprehension

Papers

Showing 101150 of 167 papers

TitleStatusHype
RefCrowd: Grounding the Target in Crowd with Referring Expressions0
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar0
Referring Expression Comprehension: A Survey of Methods and Datasets0
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension0
Referring Expression Instance Retrieval and A Strong End-to-End Baseline0
Video Referring Expression Comprehension via Transformer with Content-aware Query0
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension0
Revisiting Multi-Modal LLM Evaluation0
ScanFormer: Referring Expression Comprehension by Iteratively Scanning0
Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO0
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding0
Dynamic Graph Attention for Referring Expression Comprehension0
Dynamic Inference With Grounding Based Vision and Language Models0
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension0
Differentiated Relevances Embedding for Group-based Referring Expression Comprehension0
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph0
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
Exploring Spatial Language Grounding Through Referring Expressions0
FindIt: Generalized Localization with Natural Language Queries0
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks0
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis0
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping0
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
Synthetic Visual Genome0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding0
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension0
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension0
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension0
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding0
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving0
Language-Mediated, Object-Centric Representation Learning0
Text-driven Affordance Learning from Egocentric Vision0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
Leveraging Non-Specialists for Accurate and Time Efficient AMR Annotation0
Learning Visual Grounding from Generative Vision and Language Model0
Lite-MDETR: A Lightweight Multi-Modal Detector0
The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge0
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects0
M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension0
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression0
ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments0
MaskInversion: Localized Embeddings via Optimization of Explainability Maps0
A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension0
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction0
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary0
Evaluating and Improving Interactions with Hazy Oracles0
Modular Graph Attention Network for Complex Visual Relational Reasoning0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.