| DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset | Dec 8, 2022 | DiversityImage Description | CodeCode Available | 1 | 5 |
| A skeletonization algorithm for gradient-based optimization | Sep 5, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models | Mar 4, 2025 | Image Description | CodeCode Available | 1 | 5 |
| Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression | May 22, 2025 | HallucinationImage Description | CodeCode Available | 1 | 5 |
| Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models | May 19, 2015 | Image DescriptionPhrase Grounding | CodeCode Available | 1 | 5 |
| Can Large Multimodal Models Uncover Deep Semantics Behind Images? | Feb 17, 2024 | Image Description | CodeCode Available | 1 | 5 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 | 5 |
| Chatting Makes Perfect: Chat-based Image Retrieval | May 31, 2023 | Chat-based Image RetrievalImage Description | CodeCode Available | 1 | 5 |
| Grounded Video Description | Dec 17, 2018 | Image DescriptionSentence | CodeCode Available | 1 | 5 |
| Text-Visual Semantic Constrained AI-Generated Image Quality Assessment | Jul 14, 2025 | Image DescriptionImage Quality Assessment | CodeCode Available | 1 | 5 |