SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 201250 of 364 papers

TitleStatusHype
Towards Language-guided Visual Recognition via Dynamic ConvolutionsCode0
Decoupling Pragmatics: Discriminative Decoding for Referring Expression Generation0
Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolutionCode0
Goal-driven text descriptions for images0
Airbert: In-domain Pretraining for Vision-and-Language NavigationCode1
What can Neural Referential Form Selectors Learn?0
Enriching the E2E datasetCode0
VLN BERT: A Recurrent Vision-and-Language BERT for Navigation0
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring ExpressionCode1
Bridging the Gap Between Object Detection and User Intent via Query-Modulation0
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?Code0
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression GroundingCode1
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic RepresentationCode0
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching0
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention0
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
Playing Lottery Tickets with Vision and Language0
Understanding Synonymous Referring Expressions via Contrastive FeaturesCode0
Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction0
Scene-Intuitive Agent for Remote Embodied Visual Grounding0
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos0
OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene GroundingCode1
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement LearningCode1
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network0
Unifying Vision-and-Language Tasks via Text GenerationCode1
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation0
TRAR: Routing the Attention Spans in Transformer for Visual Question AnsweringCode1
MDETR - Modulated Detection for End-to-End Multi-Modal UnderstandingCode2
Language Controls More Than Top-Down Attention: Modulating Bottom-Up Visual Processing with Referring Expressions0
Language-Mediated, Object-Centric Representation Learning0
PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension0
Generating Quantified Referring Expressions through Attention-Driven Incremental Perception0
Improving the Naturalness and Diversity of Referring Expression Generation models using Minimum Risk Training0
OMEGA : A probabilistic approach to referring expression generation in a virtual environment0
A Linguistic Perspective on Reference: Choosing a Feature Set for Generating Referring Expressions in Context0
CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation0
Referring to what you know and do not know: Making Referring Expression Generation Models Generalize To Unseen Entities0
A Recurrent Vision-and-Language BERT for NavigationCode1
Modular Graph Attention Network for Complex Visual Relational Reasoning0
ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments0
Lessons from Computational Modelling of Reference Production in Mandarin and English0
Human-centric Spatio-Temporal Video Grounding With Visual TransformersCode1
Utilizing Every Image Object for Semi-supervised Phrase Grounding0
Computational Interpretations of Recency for the Choice of Referring Expressions in Discourse0
Language-Conditioned Feature Pyramids for Visual Selection TasksCode0
Learning to Represent Image and Text with Denotation Graph0
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression GroundingCode1
Fuzzy Logic for Vagueness Management in Referring Expression Generation0
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale BenchmarkCode1
Show:102550
← PrevPage 5 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified