| Towards Language-guided Visual Recognition via Dynamic Convolutions | Oct 17, 2021 | Question AnsweringReferring Expression | CodeCode Available | 0 |
| Decoupling Pragmatics: Discriminative Decoding for Referring Expression Generation | Oct 1, 2021 | DiversityImage Captioning | —Unverified | 0 |
| Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolution | Sep 27, 2021 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 |
| Goal-driven text descriptions for images | Aug 28, 2021 | AI AgentCaption Generation | —Unverified | 0 |
| Airbert: In-domain Pretraining for Vision-and-Language Navigation | Aug 20, 2021 | NavigateReferring Expression | CodeCode Available | 1 |
| What can Neural Referential Form Selectors Learn? | Aug 15, 2021 | FormPosition | —Unverified | 0 |
| Enriching the E2E dataset | Aug 1, 2021 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| VLN BERT: A Recurrent Vision-and-Language BERT for Navigation | Jun 19, 2021 | Decision MakingDecoder | —Unverified | 0 |
| Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression | Jun 19, 2021 | Instruction FollowingNavigate | CodeCode Available | 1 |
| Bridging the Gap Between Object Detection and User Intent via Query-Modulation | Jun 18, 2021 | Objectobject-detection | —Unverified | 0 |
| Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations? | Jun 8, 2021 | Referring ExpressionSelf-Driving Cars | CodeCode Available | 0 |
| Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding | Jun 8, 2021 | Referring ExpressionSentence | CodeCode Available | 1 |
| Referring Transformer: A One-step Approach to Multi-task Visual Grounding | Jun 6, 2021 | DecoderReferring Expression | CodeCode Available | 1 |
| Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation | May 24, 2021 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching | May 12, 2021 | Image-text matchingReferring Expression | —Unverified | 0 |
| Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention | May 5, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | Apr 26, 2021 | Generalized Referring Expression ComprehensionPhrase Grounding | CodeCode Available | 1 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| Understanding Synonymous Referring Expressions via Contrastive Features | Apr 20, 2021 | ObjectReferring Expression | CodeCode Available | 0 |
| Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction | Apr 4, 2021 | DiversityReferring Expression | —Unverified | 0 |
| Scene-Intuitive Agent for Remote Embodied Visual Grounding | Mar 24, 2021 | cross-modal alignmentNavigate | —Unverified | 0 |
| Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos | Mar 23, 2021 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding | Mar 13, 2021 | Referring ExpressionReferring Expression Segmentation | CodeCode Available | 1 |
| Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning | Mar 9, 2021 | Deep Reinforcement LearningReferring Expression | CodeCode Available | 1 |
| Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network | Feb 9, 2021 | Referring ExpressionReferring Expression Segmentation | —Unverified | 0 |
| Unifying Vision-and-Language Tasks via Text Generation | Feb 4, 2021 | Conditional Text GenerationDecoder | CodeCode Available | 1 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| TRAR: Routing the Attention Spans in Transformer for Visual Question Answering | Jan 1, 2021 | Question AnsweringReferring Expression | CodeCode Available | 1 |
| MDETR - Modulated Detection for End-to-End Multi-Modal Understanding | Jan 1, 2021 | Phrase GroundingQuestion Answering | CodeCode Available | 2 |
| Language Controls More Than Top-Down Attention: Modulating Bottom-Up Visual Processing with Referring Expressions | Jan 1, 2021 | Referring Expression | —Unverified | 0 |
| Language-Mediated, Object-Centric Representation Learning | Dec 31, 2020 | ObjectObject Discovery | —Unverified | 0 |
| PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension | Dec 20, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Generating Quantified Referring Expressions through Attention-Driven Incremental Perception | Dec 1, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Improving the Naturalness and Diversity of Referring Expression Generation models using Minimum Risk Training | Dec 1, 2020 | DiversityReferring Expression | —Unverified | 0 |
| OMEGA : A probabilistic approach to referring expression generation in a virtual environment | Dec 1, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| A Linguistic Perspective on Reference: Choosing a Feature Set for Generating Referring Expressions in Context | Dec 1, 2020 | Feature ImportanceForm | —Unverified | 0 |
| CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation | Dec 1, 2020 | ObjectReferring Expression | —Unverified | 0 |
| Referring to what you know and do not know: Making Referring Expression Generation Models Generalize To Unseen Entities | Dec 1, 2020 | DecoderReferring Expression | —Unverified | 0 |
| A Recurrent Vision-and-Language BERT for Navigation | Nov 26, 2020 | Decision MakingDecoder | CodeCode Available | 1 |
| Modular Graph Attention Network for Complex Visual Relational Reasoning | Nov 22, 2020 | Graph AttentionQuestion Answering | —Unverified | 0 |
| ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments | Nov 15, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Lessons from Computational Modelling of Reference Production in Mandarin and English | Nov 14, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Human-centric Spatio-Temporal Video Grounding With Visual Transformers | Nov 10, 2020 | Referring ExpressionSentence | CodeCode Available | 1 |
| Utilizing Every Image Object for Semi-supervised Phrase Grounding | Nov 5, 2020 | Phrase GroundingReferring Expression | —Unverified | 0 |
| Computational Interpretations of Recency for the Choice of Referring Expressions in Discourse | Nov 1, 2020 | Referring ExpressionSensitivity | —Unverified | 0 |
| Language-Conditioned Feature Pyramids for Visual Selection Tasks | Nov 1, 2020 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| Learning to Represent Image and Text with Denotation Graph | Oct 6, 2020 | AttributeImage Retrieval | —Unverified | 0 |
| Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding | Sep 3, 2020 | Referring ExpressionVocal Bursts Valence Prediction | CodeCode Available | 1 |
| Fuzzy Logic for Vagueness Management in Referring Expression Generation | Sep 1, 2020 | ManagementReferring Expression | —Unverified | 0 |
| URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark | Aug 1, 2020 | ObjectOne-shot visual object segmentation | CodeCode Available | 1 |