| Seeing the Unseen: Visual Common Sense for Semantic Placement | Jan 15, 2024 | Common Sense ReasoningImage Description | —Unverified | 0 | 0 |
| Comparing Automatic Evaluation Measures for Image Description | Jun 1, 2014 | Image DescriptionSlot Filling | —Unverified | 0 | 0 |
| DIDEC: The Dutch Image Description and Eye-tracking Corpus | Aug 1, 2018 | Image DescriptionSpecificity | —Unverified | 0 | 0 |
| DiffCap: Exploring Continuous Diffusion on Image Captioning | May 20, 2023 | Caption GenerationDiversity | —Unverified | 0 | 0 |
| Sequential Attention GAN for Interactive Image Editing | Dec 20, 2018 | Image DescriptionImage Generation | —Unverified | 0 | 0 |
| Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space | Nov 19, 2017 | Caption GenerationImage Description | —Unverified | 0 | 0 |
| Does Multimodality Help Human and Machine for Translation and Image Captioning? | May 30, 2016 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| Don't Mention the Shoe! A Learning to Rank Approach to Content Selection for Image Description Generation | Sep 1, 2016 | Image DescriptionImage Retrieval | —Unverified | 0 | 0 |
| Doubly-Attentive Decoder for Multi-modal Neural Machine Translation | Feb 4, 2017 | DecoderImage Description | —Unverified | 0 | 0 |
| Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task | Nov 1, 2017 | Image DescriptionImage Retrieval | —Unverified | 0 | 0 |