| Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Oct 12, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations | Mar 2, 2025 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples | May 25, 2024 | Active LearningImage-text Retrieval | —Unverified | 0 | 0 |
| Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training | Jan 1, 2025 | Image-text RetrievalImage to text | —Unverified | 0 | 0 |
| AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection | Apr 28, 2025 | Adversarial AttackAnomaly Detection | —Unverified | 0 | 0 |
| Anatomy-Aware Conditional Image-Text Retrieval | Mar 10, 2025 | AnatomyContrastive Learning | —Unverified | 0 | 0 |
| AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models | Oct 7, 2024 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment | Nov 30, 2024 | Image-text RetrievalRepresentation Learning | —Unverified | 0 | 0 |
| Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective | Jul 21, 2024 | Image-text RetrievalInformation Retrieval | —Unverified | 0 | 0 |
| Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding For Image & Text Retrieval | Nov 19, 2015 | Image-text RetrievalModel Selection | —Unverified | 0 | 0 |