| Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution | May 16, 2025 | Cross-Modal RetrievalImage to text | —Unverified | 0 |
| ABC: Achieving Better Control of Multimodal Embeddings using VLMs | Mar 1, 2025 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Accept the Modality Gap: An Exploration in the Hyperbolic Space | Jan 1, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |
| Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training | Jan 1, 2025 | Image-text RetrievalImage to text | —Unverified | 0 |
| AICoderEval: Improving AI Domain Code Generation of Large Language Models | Jun 7, 2024 | Code GenerationImage to text | —Unverified | 0 |
| AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method | Nov 16, 2023 | Image to textObject | —Unverified | 0 |
| An End-to-End Neural Network for Image-to-Audio Transformation | Mar 10, 2023 | Image to texttext-to-speech | —Unverified | 0 |
| An Online Learning Approach to Prompt-based Selection of Generative Models | Oct 17, 2024 | Image to text | —Unverified | 0 |
| Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models | Aug 16, 2024 | Image to text | —Unverified | 0 |
| A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 14, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition | Mar 4, 2024 | Image to text | —Unverified | 0 |
| A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models | Feb 21, 2024 | BenchmarkingImage to text | —Unverified | 0 |
| Backdooring Vision-Language Models with Out-Of-Distribution Data | Oct 2, 2024 | Image CaptioningImage to text | —Unverified | 0 |
| Better Text Understanding Through Image-To-Text Transfer | May 23, 2017 | Image to text | —Unverified | 0 |
| Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics | Oct 24, 2024 | Image to textImage-Variation | —Unverified | 0 |
| Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation | Nov 18, 2023 | Image to textSemantic Similarity | —Unverified | 0 |
| BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification | Sep 9, 2023 | Image to textLanguage Modeling | —Unverified | 0 |
| BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval | Mar 24, 2024 | DiagnosticImage Retrieval | —Unverified | 0 |
| BRIT: Bidirectional Retrieval over Unified Image-Text Graph | May 24, 2025 | Image to textQuestion Answering | —Unverified | 0 |
| Canonical Correlation Analysis for Misaligned Satellite Image Change Detection | Dec 21, 2018 | Action RecognitionChange Detection | —Unverified | 0 |
| CapText: Large Language Model-based Caption Generation From Image Context and Description | Jun 1, 2023 | Caption GenerationImage to text | —Unverified | 0 |
| Captions Are Worth a Thousand Words: Enhancing Product Retrieval with Pretrained Image-to-Text Models | Feb 13, 2024 | Image CaptioningImage to text | —Unverified | 0 |
| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 |
| VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval | Feb 13, 2023 | Cross-Modal Information RetrievalCross-Modal Retrieval | —Unverified | 0 |
| CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? | Mar 7, 2024 | Image to textImage-to-Text Retrieval | —Unverified | 0 |