| CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning | Oct 15, 2024 | Image-text RetrievalText Retrieval | —Unverified | 0 |
| AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models | Oct 7, 2024 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| From Unimodal to Multimodal: Scaling up Projectors to Align Modalities | Sep 28, 2024 | Image-text RetrievalSemantic Similarity | CodeCode Available | 0 |
| NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training | Sep 15, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations | Sep 11, 2024 | Image-text RetrievalText Retrieval | —Unverified | 0 |
| Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation | Aug 2, 2024 | Image-text RetrievalRetrieval | —Unverified | 0 |
| FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis | Jul 29, 2024 | Image-text RetrievalModel Selection | CodeCode Available | 0 |
| Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective | Jul 21, 2024 | Image-text RetrievalInformation Retrieval | —Unverified | 0 |
| Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval | Jul 17, 2024 | Image-text RetrievalObject | CodeCode Available | 0 |
| CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging | Jul 10, 2024 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |