| Impressions: Understanding Visual Semiotics and Aesthetic Impact | Oct 27, 2023 | Image CaptioningImage Description | —Unverified | 0 |
| Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments | Jun 23, 2019 | Image DescriptionPerson Re-Identification | —Unverified | 0 |
| InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models | Dec 21, 2023 | Image Description | —Unverified | 0 |
| LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Mar 21, 2025 | Code GenerationDeep Reinforcement Learning | —Unverified | 0 |
| Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images | May 31, 2024 | AnatomyImage Description | —Unverified | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline) | Nov 26, 2017 | Image DescriptionPerson Re-Identification | CodeCode Available | 0 |
| How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain | Dec 1, 2020 | Image Description | CodeCode Available | 0 |
| IDEA: Image Description Enhanced CLIP-Adapter | Jan 15, 2025 | Few-Shot Image Classificationimage-classification | CodeCode Available | 0 |
| Describing Videos by Exploiting Temporal Structure | Feb 27, 2015 | Action RecognitionImage Description | CodeCode Available | 0 |