| Multi-label Cluster Discrimination for Visual Representation Learning | Jul 24, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 4 |
| Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective | Jul 21, 2024 | Image-text RetrievalInformation Retrieval | —Unverified | 0 |
| Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval | Jul 17, 2024 | Image-text RetrievalObject | CodeCode Available | 0 |
| UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching | Jul 11, 2024 | Cross-Modal RetrievalCross-modal retrieval with noisy correspondence | CodeCode Available | 1 |
| CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging | Jul 10, 2024 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval? | Jul 10, 2024 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation | Jul 1, 2024 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 |
| Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning | Jun 26, 2024 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 0 |
| Composing Object Relations and Attributes for Image-Text Matching | Jun 17, 2024 | AttributeGraph Attention | CodeCode Available | 1 |
| Towards Vision-Language Geo-Foundation Model: A Survey | Jun 13, 2024 | Earth ObservationImage Captioning | CodeCode Available | 2 |
| RWKV-CLIP: A Robust Vision-Language Representation Learner | Jun 11, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 2 |
| Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval | Jun 9, 2024 | Image-text RetrievalPerson Retrieval | —Unverified | 0 |
| Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training | May 30, 2024 | Image-text RetrievalLanguage Modeling | —Unverified | 0 |
| Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval | May 29, 2024 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships | May 29, 2024 | Adversarial DefenseAdversarial Robustness | —Unverified | 0 |
| Accelerating Transformers with Spectrum-Preserving Token Merging | May 25, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples | May 25, 2024 | Active LearningImage-text Retrieval | —Unverified | 0 |
| PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning | May 16, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| Global–Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image–Text Retrieval | May 14, 2024 | Cross-Modal RetrievalCross-Modal Retrieval on RSITMD | —Unverified | 0 |
| UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation | Apr 22, 2024 | DiversityDomain Adaptation | —Unverified | 0 |
| Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement | Apr 6, 2024 | Image-text Retrievalobject-detection | —Unverified | 0 |
| M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models | Mar 31, 2024 | Image-text RetrievalLanguage Modeling | CodeCode Available | 3 |
| DreamLIP: Language-Image Pre-training with Long Captions | Mar 25, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning | Mar 19, 2024 | Diagnosticimage-classification | CodeCode Available | 1 |
| Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction | Mar 16, 2024 | Adversarial RobustnessImage-text Retrieval | —Unverified | 0 |
| LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival | Mar 16, 2024 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval | Mar 8, 2024 | Image-text RetrievalRetrieval | CodeCode Available | 2 |
| Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples | Mar 5, 2024 | Concept AlignmentContrastive Learning | —Unverified | 0 |
| Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning | Jan 30, 2024 | DiversityImage-text Retrieval | CodeCode Available | 0 |
| Enhancing Image-Text Matching with Adaptive Feature Aggregation | Jan 18, 2024 | Image-text matchingImage-text Retrieval | CodeCode Available | 0 |
| SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment | Jan 4, 2024 | Image Captioningimage-classification | —Unverified | 0 |
| Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data | Dec 11, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models | Dec 1, 2023 | image-classificationImage Classification | —Unverified | 0 |
| MLLMs-Augmented Visual-Language Representation Learning | Nov 30, 2023 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| A New Fine-grained Alignment Method for Image-text Matching | Nov 3, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval | Oct 30, 2023 | cross-modal alignmentImage-text Retrieval | —Unverified | 0 |
| A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | Oct 27, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| Frozen Transformers in Language Models Are Effective Visual Encoder Layers | Oct 19, 2023 | Action RecognitionImage-text Retrieval | CodeCode Available | 2 |
| Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval | Oct 12, 2023 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Oct 12, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| VeCLIP: Improving CLIP Training via Visual-enriched Captions | Oct 11, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 2 |
| ESA: External Space Attention Aggregation for Image-Text Retrieval | Oct 10, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 |
| Constructing Image-Text Pair Dataset from Books | Oct 3, 2023 | Image-text RetrievalOptical Character Recognition (OCR) | —Unverified | 0 |
| Dual Relation Alignment for Composed Image Retrieval | Sep 5, 2023 | Image RetrievalImage-text Retrieval | —Unverified | 0 |
| MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval | Sep 4, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 |
| Contrastive Feature Masking Open-Vocabulary Vision Transformer | Sep 2, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment | Aug 27, 2023 | Contrastive LearningImage-text Retrieval | CodeCode Available | 1 |
| Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval | Aug 24, 2023 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 1 |
| DLIP: Distilling Language-Image Pre-training | Aug 24, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |