| Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Jul 25, 2024 | Image to textLanguage Modeling | —Unverified | 0 |
| Cross-modal Contrastive Attention Model for Medical Report Generation | Oct 1, 2022 | Image to textMedical Report Generation | —Unverified | 0 |
| BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval | Mar 24, 2024 | DiagnosticImage Retrieval | —Unverified | 0 |
| Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation | Sep 17, 2020 | cross-modal alignmentImage to text | —Unverified | 0 |
| Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval | Dec 4, 2023 | AttributeCross-Modal Person Re-Identification | —Unverified | 0 |
| BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification | Sep 9, 2023 | Image to textLanguage Modeling | —Unverified | 0 |
| An End-to-End Neural Network for Image-to-Audio Transformation | Mar 10, 2023 | Image to texttext-to-speech | —Unverified | 0 |
| COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval | Apr 15, 2022 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| Contrastive Learning of Visual-Semantic Embeddings | Oct 17, 2021 | Contrastive Learningimage-classification | —Unverified | 0 |
| Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation | Nov 18, 2023 | Image to textSemantic Similarity | —Unverified | 0 |