| GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks | Nov 2, 2023 | Image GenerationImage to text | —Unverified | 0 |
| GPC: Generative and General Pathology Image Classifier | Jul 12, 2024 | Classificationimage-classification | —Unverified | 0 |
| Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation | Nov 18, 2023 | Image to textSemantic Similarity | —Unverified | 0 |
| ABC: Achieving Better Control of Multimodal Embeddings using VLMs | Mar 1, 2025 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs | Jan 5, 2024 | Image ComprehensionImage to text | —Unverified | 0 |
| From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Nov 5, 2024 | Change DetectionContrastive Learning | —Unverified | 0 |
| Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics | Oct 24, 2024 | Image to textImage-Variation | —Unverified | 0 |
| Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution | May 16, 2025 | Cross-Modal RetrievalImage to text | —Unverified | 0 |
| From Image to Text in Sentiment Analysis via Regression and Deep Learning | Sep 1, 2019 | Image to textregression | —Unverified | 0 |
| CoBIT: A Contrastive Bi-directional Image-Text Generation Model | Mar 23, 2023 | DecoderImage Generation | —Unverified | 0 |