| X-Fusion: Introducing New Modality to Frozen Large Language Models | Apr 29, 2025 | Image to text | —Unverified | 0 | 0 |
| 15M Multimodal Facial Image-Text Dataset | Jul 11, 2024 | Image to text | —Unverified | 0 | 0 |
| Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Oct 12, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution | May 16, 2025 | Cross-Modal RetrievalImage to text | —Unverified | 0 | 0 |
| ABC: Achieving Better Control of Multimodal Embeddings using VLMs | Mar 1, 2025 | Image to textImage-to-Text Retrieval | —Unverified | 0 | 0 |
| Accept the Modality Gap: An Exploration in the Hyperbolic Space | Jan 1, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 | 0 |
| Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training | Jan 1, 2025 | Image-text RetrievalImage to text | —Unverified | 0 | 0 |
| AICoderEval: Improving AI Domain Code Generation of Large Language Models | Jun 7, 2024 | Code GenerationImage to text | —Unverified | 0 | 0 |
| AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method | Nov 16, 2023 | Image to textObject | —Unverified | 0 | 0 |
| An End-to-End Neural Network for Image-to-Audio Transformation | Mar 10, 2023 | Image to texttext-to-speech | —Unverified | 0 | 0 |