| LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning | Mar 21, 2025 | Code GenerationDeep Reinforcement Learning | —Unverified | 0 |
| VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models | Mar 10, 2025 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Boli: A dataset for understanding stuttering experience and analyzing stuttered speech | Jan 27, 2025 | Image Description | —Unverified | 0 |
| IDEA: Image Description Enhanced CLIP-Adapter | Jan 15, 2025 | Few-Shot Image Classificationimage-classification | CodeCode Available | 0 |
| Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis | Jan 13, 2025 | Image DescriptionTransfer Learning | —Unverified | 0 |
| A Preliminary Survey of Semantic Descriptive Model for Images | Jan 13, 2025 | DescriptiveImage Description | —Unverified | 0 |
| RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback | Jan 1, 2025 | HallucinationImage Comprehension | CodeCode Available | 0 |
| Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis | Dec 4, 2024 | Image CaptioningImage Description | —Unverified | 0 |
| TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models | Nov 2, 2024 | Image DescriptionImage Generation | —Unverified | 0 |
| MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps | Oct 18, 2024 | Image DescriptionInformativeness | CodeCode Available | 0 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images | May 31, 2024 | AnatomyImage Description | —Unverified | 0 |
| Data-augmented phrase-level alignment for mitigating object hallucination | May 28, 2024 | Data AugmentationHallucination | —Unverified | 0 |
| WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization | May 28, 2024 | Domain GeneralizationImage Description | —Unverified | 0 |
| MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Mar 5, 2024 | DiversityImage Description | CodeCode Available | 0 |
| Artwork Explanation in Large-scale Vision Language Models | Feb 29, 2024 | Explanation GenerationImage Description | —Unverified | 0 |
| A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models | Feb 28, 2024 | Image DescriptionQuestion Answering | —Unverified | 0 |
| Seeing the Unseen: Visual Common Sense for Semantic Placement | Jan 15, 2024 | Common Sense ReasoningImage Description | —Unverified | 0 |
| InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models | Dec 21, 2023 | Image Description | —Unverified | 0 |
| Localized Symbolic Knowledge Distillation for Visual Commonsense Models | Dec 8, 2023 | Image DescriptionInstruction Following | CodeCode Available | 0 |
| Impressions: Understanding Visual Semiotics and Aesthetic Impact | Oct 27, 2023 | Image CaptioningImage Description | —Unverified | 0 |
| Large Language Models can Share Images, Too! | Oct 23, 2023 | Image DescriptionSentence | CodeCode Available | 0 |
| Bounding and Filling: A Fast and Flexible Framework for Image Captioning | Oct 15, 2023 | Image CaptioningImage Description | CodeCode Available | 0 |
| ContextRef: Evaluating Referenceless Metrics For Image Description Generation | Sep 21, 2023 | Image Description | CodeCode Available | 0 |
| A Fine-Grained Image Description Generation Method Based on Joint Objectives | Sep 2, 2023 | Image DescriptionObject | —Unverified | 0 |