| DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation | Apr 16, 2025 | Contrastive LearningImage to text | —Unverified | 0 |
| TMCIR: Token Merge Benefits Composed Image Retrieval | Apr 15, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs | Apr 11, 2025 | BenchmarkingImage Generation | CodeCode Available | 1 |
| LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text | Mar 25, 2025 | Cross-Modal RetrievalHallucination | CodeCode Available | 1 |
| Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module | Mar 24, 2025 | Image to textMedical Report Generation | —Unverified | 0 |
| Natural Language Generation | Mar 20, 2025 | Image CaptioningImage to text | —Unverified | 0 |
| PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval | Mar 20, 2025 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 0 |
| Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data | Mar 19, 2025 | Image to text | CodeCode Available | 0 |
| MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection | Mar 17, 2025 | Anomaly DetectionForm | —Unverified | 0 |
| FlowTok: Flowing Seamlessly Across Text and Image Tokens | Mar 13, 2025 | DenoisingImage to text | CodeCode Available | 5 |