| BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval | Mar 24, 2024 | DiagnosticImage Retrieval | —Unverified | 0 | 0 |
| BRIT: Bidirectional Retrieval over Unified Image-Text Graph | May 24, 2025 | Image to textQuestion Answering | —Unverified | 0 | 0 |
| Canonical Correlation Analysis for Misaligned Satellite Image Change Detection | Dec 21, 2018 | Action RecognitionChange Detection | —Unverified | 0 | 0 |
| CapText: Large Language Model-based Caption Generation From Image Context and Description | Jun 1, 2023 | Caption GenerationImage to text | —Unverified | 0 | 0 |
| Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models | Feb 13, 2024 | Image CaptioningImage to text | —Unverified | 0 | 0 |
| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 | 0 |
| VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval | Feb 13, 2023 | Cross-Modal Information RetrievalCross-Modal Retrieval | —Unverified | 0 | 0 |
| CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? | Mar 7, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 | 0 |
| CoBIT: A Contrastive Bi-directional Image-Text Generation Model | Mar 23, 2023 | DecoderImage Generation | —Unverified | 0 | 0 |
| CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs | Jan 5, 2024 | Image ComprehensionImage to text | —Unverified | 0 | 0 |