| RWKV-CLIP: A Robust Vision-Language Representation Learner | Jun 11, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 2 |
| Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval | Jun 9, 2024 | Image-text RetrievalPerson Retrieval | —Unverified | 0 |
| Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training | May 30, 2024 | Image-text RetrievalLanguage Modeling | —Unverified | 0 |
| Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships | May 29, 2024 | Adversarial DefenseAdversarial Robustness | —Unverified | 0 |
| Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval | May 29, 2024 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Accelerating Transformers with Spectrum-Preserving Token Merging | May 25, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples | May 25, 2024 | Active LearningImage-text Retrieval | —Unverified | 0 |
| PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning | May 16, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| Global–Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image–Text Retrieval | May 14, 2024 | Cross-Modal RetrievalCross-Modal Retrieval on RSITMD | —Unverified | 0 |
| UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation | Apr 22, 2024 | DiversityDomain Adaptation | —Unverified | 0 |