| Face2Text revisited: Improved data set and baseline results | May 24, 2022 | Image DescriptionTransfer Learning | —Unverified | 0 |
| Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation | May 2, 2022 | Image DescriptionMachine Translation | —Unverified | 0 |
| Multimodal fusion via cortical network inspired losses | May 1, 2022 | Emotion RecognitionImage Description | —Unverified | 0 |
| UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling | Nov 23, 2021 | Image CaptioningImage Description | CodeCode Available | 1 |
| Neural Dependency Coding inspired Multimodal Fusion | Sep 28, 2021 | Emotion RecognitionImage Description | —Unverified | 0 |
| CIDEr-R: Robust Consensus-based Image Description Evaluation | Sep 28, 2021 | DescriptiveImage Description | CodeCode Available | 0 |
| Cross Modification Attention Based Deliberation Model for Image Captioning | Sep 17, 2021 | DecoderDescriptive | —Unverified | 0 |
| SafeAccess+: An Intelligent System to make Smart Home Safer and Americans with Disability Act Compliant | Sep 14, 2021 | Image Description | —Unverified | 0 |
| Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP | Sep 6, 2021 | Image DescriptionOut-of-Distribution Detection | CodeCode Available | 1 |
| Revisiting Binary Local Image Description for Resource Limited Devices | Aug 18, 2021 | Image DescriptionTriplet | CodeCode Available | 1 |
| Computer Vision and Conflicting Values: Describing People with Automated Alt Text | May 26, 2021 | Image Description | —Unverified | 0 |
| How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain | Dec 1, 2020 | Image Description | CodeCode Available | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| EPYNET: Efficient Pyramidal Network for Clothing Segmentation | Oct 13, 2020 | Data AugmentationImage Description | —Unverified | 0 |
| Multimodal Word Sense Disambiguation in Creative Practice | Jul 15, 2020 | ClassificationDescriptive | CodeCode Available | 0 |
| On the use of human reference data for evaluating automatic image descriptions | Jun 15, 2020 | Image Description | —Unverified | 0 |
| ParaCNN: Visual Paragraph Generation via Adversarial Twin Contextual CNNs | Apr 21, 2020 | Image CaptioningImage Description | —Unverified | 0 |
| Textual Visual Semantic Dataset for Text Spotting | Apr 21, 2020 | Image Descriptiontext similarity | —Unverified | 0 |
| On Architectures for Including Visual Information in Neural Language Models for Image Description | Nov 9, 2019 | Image DescriptionLanguage Modeling | CodeCode Available | 0 |
| Tell Me More: A Dataset of Visual Scene Description Sequences | Oct 1, 2019 | Image DescriptionSentence | —Unverified | 0 |
| A Hierarchical Approach for Visual Storytelling Using Image Description | Sep 26, 2019 | DecoderImage Description | —Unverified | 0 |
| VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions | Jul 22, 2019 | Image DescriptionSemantic Similarity | —Unverified | 0 |
| Place recognition in gardens by learning visual representations: data set and benchmark analysis | Jun 28, 2019 | Camera LocalizationImage Description | —Unverified | 0 |
| Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments | Jun 23, 2019 | Image DescriptionPerson Re-Identification | —Unverified | 0 |
| What a neural language model tells us about spatial relations | Jun 1, 2019 | Image DescriptionLanguage Modeling | CodeCode Available | 0 |
| The Lexical Gap: An Improved Measure of Automated Image Description Quality | May 1, 2019 | 2kDiversity | —Unverified | 0 |
| Multi-modal gated recurrent units for image description | Apr 20, 2019 | Image DescriptionSentence | —Unverified | 0 |
| Human Attention in Image Captioning: Dataset and Analysis | Mar 6, 2019 | Image CaptioningImage Description | CodeCode Available | 0 |
| Sequential Attention GAN for Interactive Image Editing | Dec 20, 2018 | Image DescriptionImage Generation | —Unverified | 0 |
| Grounded Video Description | Dec 17, 2018 | Image DescriptionSentence | CodeCode Available | 1 |
| Unsupervised Image Captioning | Nov 27, 2018 | Image CaptioningImage Description | CodeCode Available | 0 |
| Adding the Third Dimension to Spatial Relation Detection in 2D Images | Nov 1, 2018 | Image DescriptionObject | —Unverified | 0 |
| The Task Matters: Comparing Image Captioning and Task-Based Dialogical Image Description | Nov 1, 2018 | Image CaptioningImage Description | —Unverified | 0 |
| Talking about other people: an endless range of possibilities | Nov 1, 2018 | Image DescriptionText Generation | CodeCode Available | 0 |
| Recurrent Attention Unit | Oct 30, 2018 | General ClassificationHandwriting Recognition | —Unverified | 0 |
| Exploring Visual Relationship for Image Captioning | Sep 19, 2018 | DecoderImage Captioning | —Unverified | 0 |
| Unsupervised Stylish Image Description Generation via Domain Layer Norm | Sep 11, 2018 | Image Description | —Unverified | 0 |
| DIDEC: The Dutch Image Description and Eye-tracking Corpus | Aug 1, 2018 | Image DescriptionSpecificity | —Unverified | 0 |
| Measuring the Diversity of Automatic Image Descriptions | Aug 1, 2018 | DiversityImage Description | CodeCode Available | 0 |
| Varying image description tasks: spoken versus written descriptions | Aug 1, 2018 | Image Description | CodeCode Available | 0 |
| Deep Imbalanced Attribute Classification using Visual Attention Aggregation | Jul 10, 2018 | AttributeClassification | CodeCode Available | 0 |
| Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data | Jul 1, 2018 | Image DescriptionMachine Translation | —Unverified | 0 |
| Bridging Languages through Images with Deep Partial Canonical Correlation Analysis | Jul 1, 2018 | Image DescriptionImage Retrieval | CodeCode Available | 0 |
| Contextualize, Show and Tell: A Neural Visual Storyteller | Jun 3, 2018 | DecoderImage Description | CodeCode Available | 0 |
| Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks | Jun 1, 2018 | Caption GenerationImage Captioning | —Unverified | 0 |
| Multimodal Machine Translation with Reinforcement Learning | May 7, 2018 | Image DescriptionMachine Translation | —Unverified | 0 |
| Customized Image Narrative Generation via Interactive Visual Question Generation and Answering | Apr 27, 2018 | DiversityImage Description | —Unverified | 0 |
| Compositional Obverter Communication Learning From Raw Visual Input | Apr 6, 2018 | Image Description | CodeCode Available | 0 |
| Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions | Mar 10, 2018 | Image DescriptionImage to text | CodeCode Available | 0 |
| Zero-Resource Neural Machine Translation with Multi-Agent Communication Game | Feb 9, 2018 | DecoderImage Captioning | —Unverified | 0 |