| From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Nov 5, 2024 | Change DetectionContrastive Learning | —Unverified | 0 | 0 |
| GPC: Generative and General Pathology Image Classifier | Jul 12, 2024 | Classificationimage-classification | —Unverified | 0 | 0 |
| GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks | Nov 2, 2023 | Image GenerationImage to text | —Unverified | 0 | 0 |
| GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training | Aug 22, 2023 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Hierarchical Gumbel Attention Network for Text-based Person Search | Oct 10, 2020 | Image RetrievalImage to text | —Unverified | 0 | 0 |
| HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels | Jul 8, 2024 | Contrastive LearningImage Retrieval | —Unverified | 0 | 0 |
| I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation | Mar 20, 2017 | Caption GenerationData Augmentation | —Unverified | 0 | 0 |
| Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial Networks | Oct 11, 2019 | Generative Adversarial NetworkImage-to-Image Translation | —Unverified | 0 | 0 |
| Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models | Nov 8, 2024 | Image CaptioningImage Generation | —Unverified | 0 | 0 |
| Image Captioners Sometimes Tell More Than Images They See | May 4, 2023 | DescriptiveImage Captioning | —Unverified | 0 | 0 |
| Image Semantic Relation Generation | Oct 19, 2022 | Image RetrievalImage Segmentation | —Unverified | 0 | 0 |
| Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module | Mar 24, 2025 | Image to textMedical Report Generation | —Unverified | 0 | 0 |
| Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything | Jul 1, 2024 | Image to textLanguage Modeling | —Unverified | 0 | 0 |
| Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation | Nov 23, 2024 | Cross-Modal RetrievalImage to text | —Unverified | 0 | 0 |
| Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration | Jun 12, 2025 | cross-modal alignmentImage to text | —Unverified | 0 | 0 |
| Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling | Mar 13, 2023 | DecoderImage to text | —Unverified | 0 | 0 |
| Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards | Oct 21, 2022 | Image to textnamed-entity-recognition | —Unverified | 0 | 0 |
| Instruction Tuning-free Visual Token Complement for Multimodal LLMs | Aug 9, 2024 | Image GenerationImage to text | —Unverified | 0 | 0 |
| Interpreting Vision and Language Generative Models with Semantic Visual Priors | Apr 28, 2023 | Image to text | —Unverified | 0 | 0 |
| Is Cross-modal Information Retrieval Possible without Training? | Apr 20, 2023 | Contrastive LearningCross-Modal Information Retrieval | —Unverified | 0 | 0 |
| I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models | Jun 13, 2023 | Adversarial AttackDecoder | —Unverified | 0 | 0 |
| Knowledge Aware Semantic Concept Expansion for Image-Text Matching | Aug 10, 2019 | Common Sense ReasoningContent-Based Image Retrieval | —Unverified | 0 | 0 |
| Knowledge driven Description Synthesis for Floor Plan Interpretation | Mar 15, 2021 | Caption GenerationDescriptive | —Unverified | 0 | 0 |
| Semantically Grounded QFormer for Efficient Vision Language Understanding | Nov 13, 2023 | DiversityImage to text | —Unverified | 0 | 0 |
| Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision | Oct 24, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 | 0 |