SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 201225 of 364 papers

TitleStatusHype
Towards Language-guided Visual Recognition via Dynamic ConvolutionsCode0
Decoupling Pragmatics: Discriminative Decoding for Referring Expression Generation0
Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolutionCode0
Goal-driven text descriptions for images0
Airbert: In-domain Pretraining for Vision-and-Language NavigationCode1
What can Neural Referential Form Selectors Learn?0
Enriching the E2E datasetCode0
VLN BERT: A Recurrent Vision-and-Language BERT for Navigation0
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring ExpressionCode1
Bridging the Gap Between Object Detection and User Intent via Query-Modulation0
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?Code0
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression GroundingCode1
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic RepresentationCode0
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching0
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention0
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
Playing Lottery Tickets with Vision and Language0
Understanding Synonymous Referring Expressions via Contrastive FeaturesCode0
Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction0
Scene-Intuitive Agent for Remote Embodied Visual Grounding0
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos0
OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene GroundingCode1
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement LearningCode1
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network0
Show:102550
← PrevPage 9 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified