| WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization | May 28, 2024 | Domain GeneralizationImage Description | —Unverified | 0 |
| MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Mar 5, 2024 | DiversityImage Description | CodeCode Available | 0 |
| Artwork Explanation in Large-scale Vision Language Models | Feb 29, 2024 | Explanation GenerationImage Description | —Unverified | 0 |
| A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models | Feb 28, 2024 | Image DescriptionQuestion Answering | —Unverified | 0 |
| Can Large Multimodal Models Uncover Deep Semantics Behind Images? | Feb 17, 2024 | Image Description | CodeCode Available | 1 |
| Seeing the Unseen: Visual Common Sense for Semantic Placement | Jan 15, 2024 | Common Sense ReasoningImage Description | —Unverified | 0 |
| InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models | Dec 21, 2023 | Image Description | —Unverified | 0 |
| Localized Symbolic Knowledge Distillation for Visual Commonsense Models | Dec 8, 2023 | Image DescriptionInstruction Following | CodeCode Available | 0 |
| Impressions: Understanding Visual Semiotics and Aesthetic Impact | Oct 27, 2023 | Image CaptioningImage Description | —Unverified | 0 |
| Large Language Models can Share Images, Too! | Oct 23, 2023 | Image DescriptionSentence | CodeCode Available | 0 |