| WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization | May 28, 2024 | Domain GeneralizationImage Description | —Unverified | 0 |
| Zero-Resource Neural Machine Translation with Multi-Agent Communication Game | Feb 9, 2018 | DecoderImage Captioning | —Unverified | 0 |
| Focused Evaluation for Image Description with Binary Forced-Choice Tasks | Aug 1, 2016 | Image CaptioningImage Description | —Unverified | 0 |
| From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning | Oct 11, 2016 | FormGrounded language learning | —Unverified | 0 |
| Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks | Jun 1, 2018 | Caption GenerationImage Captioning | —Unverified | 0 |
| Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation | May 2, 2022 | Image DescriptionMachine Translation | —Unverified | 0 |
| Im2Text: Describing Images Using 1 Million Captioned Photographs | Dec 1, 2011 | Image CaptioningImage Description | —Unverified | 0 |
| Image Description Dataset for Language Learners | Jun 1, 2022 | Image DescriptionSentence | —Unverified | 0 |
| Image Description using Visual Dependency Representations | Oct 1, 2013 | Image DescriptionImage Retrieval | —Unverified | 0 |
| Image Pivoting for Learning Multilingual Multimodal Representations | Jul 24, 2017 | Image DescriptionImage Retrieval | —Unverified | 0 |
| Impressions: Understanding Visual Semiotics and Aesthetic Impact | Oct 27, 2023 | Image CaptioningImage Description | —Unverified | 0 |
| Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments | Jun 23, 2019 | Image DescriptionPerson Re-Identification | —Unverified | 0 |
| InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models | Dec 21, 2023 | Image Description | —Unverified | 0 |
| Fan-Beam Binarization Difference Projection (FB-BDP): A Novel Local Object Descriptor for Fine-Grained Leaf Image Retrieval | Jan 1, 2023 | BinarizationImage Description | CodeCode Available | 0 |
| The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification | Nov 27, 2014 | General Classificationimage-classification | CodeCode Available | 0 |
| On Architectures for Including Visual Information in Neural Language Models for Image Description | Nov 9, 2019 | Image DescriptionLanguage Modeling | CodeCode Available | 0 |
| Bridging Languages through Images with Deep Partial Canonical Correlation Analysis | Jul 1, 2018 | Image DescriptionImage Retrieval | CodeCode Available | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions | Mar 10, 2018 | Image DescriptionImage to text | CodeCode Available | 0 |
| Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline) | Nov 26, 2017 | Image DescriptionPerson Re-Identification | CodeCode Available | 0 |
| How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain | Dec 1, 2020 | Image Description | CodeCode Available | 0 |
| IDEA: Image Description Enhanced CLIP-Adapter | Jan 15, 2025 | Few-Shot Image Classificationimage-classification | CodeCode Available | 0 |
| Deep Imbalanced Attribute Classification using Visual Attention Aggregation | Jul 10, 2018 | AttributeClassification | CodeCode Available | 0 |
| Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network | Jul 12, 2022 | Action RecognitionImage Description | CodeCode Available | 0 |
| Unsupervised Image Captioning | Nov 27, 2018 | Image CaptioningImage Description | CodeCode Available | 0 |
| Compositional Obverter Communication Learning From Raw Visual Input | Apr 6, 2018 | Image Description | CodeCode Available | 0 |
| Efficient Decentralized Visual Place Recognition From Full-Image Descriptors | May 30, 2017 | ClusteringImage Description | CodeCode Available | 0 |
| Talking about other people: an endless range of possibilities | Nov 1, 2018 | Image DescriptionText Generation | CodeCode Available | 0 |
| Human Attention in Image Captioning: Dataset and Analysis | Mar 6, 2019 | Image CaptioningImage Description | CodeCode Available | 0 |
| Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings | Mar 30, 2016 | Image DescriptionImage Retrieval | CodeCode Available | 0 |
| CIDEr-R: Robust Consensus-based Image Description Evaluation | Sep 28, 2021 | DescriptiveImage Description | CodeCode Available | 0 |
| Pragmatic factors in image description: the case of negations | Jun 20, 2016 | Image DescriptionNegation | CodeCode Available | 0 |
| Large Language Models can Share Images, Too! | Oct 23, 2023 | Image DescriptionSentence | CodeCode Available | 0 |
| Cross-linguistic differences and similarities in image descriptions | Jul 6, 2017 | Image DescriptionSpecificity | CodeCode Available | 0 |
| Varying image description tasks: spoken versus written descriptions | Aug 1, 2018 | Image Description | CodeCode Available | 0 |
| Localized Symbolic Knowledge Distillation for Visual Commonsense Models | Dec 8, 2023 | Image DescriptionInstruction Following | CodeCode Available | 0 |
| Long-term Recurrent Convolutional Networks for Visual Recognition and Description | Nov 17, 2014 | Image DescriptionRetrieval | CodeCode Available | 0 |
| Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval | Oct 10, 2022 | Cross-Modal Information RetrievalImage Description | CodeCode Available | 0 |
| MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Mar 5, 2024 | DiversityImage Description | CodeCode Available | 0 |
| Measuring the Diversity of Automatic Image Descriptions | Aug 1, 2018 | DiversityImage Description | CodeCode Available | 0 |
| MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps | Oct 18, 2024 | Image DescriptionInformativeness | CodeCode Available | 0 |
| What a neural language model tells us about spatial relations | Jun 1, 2019 | Image DescriptionLanguage Modeling | CodeCode Available | 0 |
| Does Multimodality Help Human and Machine for Translation and Image Captioning? | May 30, 2016 | Image CaptioningImage Description | CodeCode Available | 0 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Describing Videos by Exploiting Temporal Structure | Feb 27, 2015 | Action RecognitionImage Description | CodeCode Available | 0 |
| VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models | Mar 10, 2025 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Multi30K: Multilingual English-German Image Descriptions | May 2, 2016 | Image DescriptionMachine Translation | CodeCode Available | 0 |
| Contextualize, Show and Tell: A Neural Visual Storyteller | Jun 3, 2018 | DecoderImage Description | CodeCode Available | 0 |
| Multilingual Image Description with Neural Sequence Models | Oct 15, 2015 | Image CaptioningImage Description | CodeCode Available | 0 |
| Room for improvement in automatic image description: an error analysis | Apr 13, 2017 | Image Description | CodeCode Available | 0 |