| Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module | Mar 24, 2025 | Image to textMedical Report Generation | —Unverified | 0 |
| PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval | Mar 20, 2025 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 0 |
| Natural Language Generation | Mar 20, 2025 | Image CaptioningImage to text | —Unverified | 0 |
| Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data | Mar 19, 2025 | Image to text | CodeCode Available | 0 |
| MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection | Mar 17, 2025 | Anomaly DetectionForm | —Unverified | 0 |
| ABC: Achieving Better Control of Multimodal Embeddings using VLMs | Mar 1, 2025 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation | Feb 26, 2025 | Cross-Modal RetrievalHallucination | —Unverified | 0 |
| Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models | Feb 18, 2025 | Image to textOptical Character Recognition | CodeCode Available | 0 |
| Natural Language Generation from Visual Sequences: Challenges and Future Directions | Feb 18, 2025 | Image to textText Generation | —Unverified | 0 |
| UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation | Feb 16, 2025 | Binary ClassificationFake News Detection | —Unverified | 0 |