| Image-text Retrieval via Preserving Main Semantics of Vision | Apr 20, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 | 5 |
| FlexiViT: One Model for All Patch Sizes | Dec 15, 2022 | AllImage-text Retrieval | CodeCode Available | 1 | 5 |
| CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback | Jun 19, 2021 | Image RetrievalImage-text Retrieval | CodeCode Available | 1 | 5 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 | 5 |
| Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | May 11, 2023 | Contrastive LearningImage-text Retrieval | CodeCode Available | 1 | 5 |
| Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models | Jul 26, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark | Jun 10, 2023 | Image-text RetrievalMedical Report Generation | CodeCode Available | 1 | 5 |
| UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching | Jul 11, 2024 | Cross-Modal RetrievalCross-modal retrieval with noisy correspondence | CodeCode Available | 1 | 5 |
| An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing | Feb 26, 2022 | Image-text RetrievalMeta-Learning | CodeCode Available | 0 | 5 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 | 5 |