| Multi30K: Multilingual English-German Image Descriptions | May 2, 2016 | Image DescriptionMachine Translation | CodeCode Available | 0 | 5 |
| Multilingual Image Description with Neural Sequence Models | Oct 15, 2015 | Image CaptioningImage Description | CodeCode Available | 0 | 5 |
| Multimodal Word Sense Disambiguation in Creative Practice | Jul 15, 2020 | ClassificationDescriptive | CodeCode Available | 0 | 5 |
| On Architectures for Including Visual Information in Neural Language Models for Image Description | Nov 9, 2019 | Image DescriptionLanguage Modeling | CodeCode Available | 0 | 5 |
| Pragmatic factors in image description: the case of negations | Jun 20, 2016 | Image DescriptionNegation | CodeCode Available | 0 | 5 |
| Room for improvement in automatic image description: an error analysis | Apr 13, 2017 | Image Description | CodeCode Available | 0 | 5 |
| RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback | Jan 1, 2025 | HallucinationImage Comprehension | CodeCode Available | 0 | 5 |
| Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network | Jul 12, 2022 | Action RecognitionImage Description | CodeCode Available | 0 | 5 |
| Talking about other people: an endless range of possibilities | Nov 1, 2018 | Image DescriptionText Generation | CodeCode Available | 0 | 5 |
| The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification | Nov 27, 2014 | General Classificationimage-classification | CodeCode Available | 0 | 5 |
| Unsupervised Image Captioning | Nov 27, 2018 | Image CaptioningImage Description | CodeCode Available | 0 | 5 |
| Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings | Mar 30, 2016 | Image DescriptionImage Retrieval | CodeCode Available | 0 | 5 |
| Varying image description tasks: spoken versus written descriptions | Aug 1, 2018 | Image Description | CodeCode Available | 0 | 5 |
| VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models | Mar 10, 2025 | Image DescriptionMultiple-choice | CodeCode Available | 0 | 5 |
| What a neural language model tells us about spatial relations | Jun 1, 2019 | Image DescriptionLanguage Modeling | CodeCode Available | 0 | 5 |
| A Fine-Grained Image Description Generation Method Based on Joint Objectives | Sep 2, 2023 | Image DescriptionObject | —Unverified | 0 | 0 |
| Collecting Image Description Datasets using Crowdsourcing | Nov 12, 2014 | Image DescriptionSentence | —Unverified | 0 | 0 |
| Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation | May 2, 2022 | Image DescriptionMachine Translation | —Unverified | 0 | 0 |
| Adaptive Color Attributes for Real-Time Visual Tracking | Jun 1, 2014 | AttributeImage Description | —Unverified | 0 | 0 |
| Tell Me More: A Dataset of Visual Scene Description Sequences | Oct 1, 2019 | Image DescriptionSentence | —Unverified | 0 | 0 |
| Im2Text: Describing Images Using 1 Million Captioned Photographs | Dec 1, 2011 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| Image Description Dataset for Language Learners | Jun 1, 2022 | Image DescriptionSentence | —Unverified | 0 | 0 |
| Image Description using Visual Dependency Representations | Oct 1, 2013 | Image DescriptionImage Retrieval | —Unverified | 0 | 0 |
| Image Pivoting for Learning Multilingual Multimodal Representations | Jul 24, 2017 | Image DescriptionImage Retrieval | —Unverified | 0 | 0 |
| Boli: A dataset for understanding stuttering experience and analyzing stuttered speech | Jan 27, 2025 | Image Description | —Unverified | 0 | 0 |
| Impressions: Understanding Visual Semiotics and Aesthetic Impact | Oct 27, 2023 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments | Jun 23, 2019 | Image DescriptionPerson Re-Identification | —Unverified | 0 | 0 |
| InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models | Dec 21, 2023 | Image Description | —Unverified | 0 | 0 |
| LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Mar 21, 2025 | Code GenerationDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images | May 31, 2024 | AnatomyImage Description | —Unverified | 0 | 0 |
| Textual Visual Semantic Dataset for Text Spotting | Apr 21, 2020 | Image Descriptiontext similarity | —Unverified | 0 | 0 |
| Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data | Sep 8, 2016 | Action ClassificationClassification | —Unverified | 0 | 0 |
| Local Higher-Order Statistics (LHS) describing images with statistics of local non-binarized pixel patterns | Oct 2, 2015 | Image DescriptionQuantization | —Unverified | 0 | 0 |
| Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism | Apr 23, 2025 | DecoderImage Description | —Unverified | 0 | 0 |
| The Image Torque Operator for Contour Processing | Jan 18, 2016 | Edge DetectionImage Description | —Unverified | 0 | 0 |
| The Lexical Gap: An Improved Measure of Automated Image Description Quality | May 1, 2019 | 2kDiversity | —Unverified | 0 | 0 |
| The Long-Short Story of Movie Description | Jun 4, 2015 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| The Task Matters: Comparing Image Captioning and Task-Based Dialogical Image Description | Nov 1, 2018 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| A Genetic Algorithm Approach for ImageRepresentation Learning through Color Quantization | Nov 18, 2017 | Content-Based Image RetrievalImage Description | —Unverified | 0 | 0 |
| Mind's Eye: A Recurrent Visual Representation for Image Caption Generation | Jun 1, 2015 | Caption GenerationImage Description | —Unverified | 0 | 0 |
| Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures | Jan 15, 2016 | Image DescriptionRetrieval | —Unverified | 0 | 0 |
| A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching | Jun 1, 2013 | Image DescriptionVideo Description | —Unverified | 0 | 0 |
| A Shared Task on Multimodal Machine Translation and Crosslingual Image Description | Aug 1, 2016 | Image DescriptionImage Retrieval | —Unverified | 0 | 0 |
| Data-augmented phrase-level alignment for mitigating object hallucination | May 28, 2024 | Data AugmentationHallucination | —Unverified | 0 | 0 |
| Adding the Third Dimension to Spatial Relation Detection in 2D Images | Nov 1, 2018 | Image DescriptionObject | —Unverified | 0 | 0 |
| Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset | Jun 1, 2022 | Caption Generationimage-classification | —Unverified | 0 | 0 |
| TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models | Nov 2, 2024 | Image DescriptionImage Generation | —Unverified | 0 | 0 |
| Multimodal fusion via cortical network inspired losses | May 1, 2022 | Emotion RecognitionImage Description | —Unverified | 0 | 0 |
| Multi-modal gated recurrent units for image description | Apr 20, 2019 | Image DescriptionSentence | —Unverified | 0 | 0 |
| Multimodal Machine Translation with Reinforcement Learning | May 7, 2018 | Image DescriptionMachine Translation | —Unverified | 0 | 0 |