| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 |
| TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP | May 24, 2025 | Image CaptioningImage Generation | —Unverified | 0 |
| BRIT: Bidirectional Retrieval over Unified Image-Text Graph | May 24, 2025 | Image to textQuestion Answering | —Unverified | 0 |
| Robustifying Vision-Language Models via Dynamic Token Reweighting | May 22, 2025 | Image to text | —Unverified | 0 |
| UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings | May 17, 2025 | Image to textInformation Retrieval | CodeCode Available | 0 |
| Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution | May 16, 2025 | Cross-Modal RetrievalImage to text | —Unverified | 0 |
| X-Fusion: Introducing New Modality to Frozen Large Language Models | Apr 29, 2025 | Image to text | —Unverified | 0 |
| SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Apr 17, 2025 | Cross-Modal RetrievalImage Retrieval | —Unverified | 0 |
| DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation | Apr 16, 2025 | Contrastive LearningImage to text | —Unverified | 0 |
| TMCIR: Token Merge Benefits Composed Image Retrieval | Apr 15, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |