| Im2Text: Describing Images Using 1 Million Captioned Photographs | Dec 1, 2011 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| Image Description Dataset for Language Learners | Jun 1, 2022 | Image DescriptionSentence | —Unverified | 0 | 0 |
| Image Description using Visual Dependency Representations | Oct 1, 2013 | Image DescriptionImage Retrieval | —Unverified | 0 | 0 |
| Image Pivoting for Learning Multilingual Multimodal Representations | Jul 24, 2017 | Image DescriptionImage Retrieval | —Unverified | 0 | 0 |
| Boli: A dataset for understanding stuttering experience and analyzing stuttered speech | Jan 27, 2025 | Image Description | —Unverified | 0 | 0 |
| Impressions: Understanding Visual Semiotics and Aesthetic Impact | Oct 27, 2023 | Image CaptioningImage Description | —Unverified | 0 | 0 |
| Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments | Jun 23, 2019 | Image DescriptionPerson Re-Identification | —Unverified | 0 | 0 |
| InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models | Dec 21, 2023 | Image Description | —Unverified | 0 | 0 |
| LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Mar 21, 2025 | Code GenerationDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images | May 31, 2024 | AnatomyImage Description | —Unverified | 0 | 0 |