| An analysis of vision-language models for fabric retrieval | Jul 7, 2025 | AttributeCross-Modal Retrieval | —Unverified | 0 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | Mar 28, 2024 | Image RetrievalImplicit Relations | CodeCode Available | 3 |
| M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining | Jan 29, 2024 | GPUzero-shot-classification | CodeCode Available | 0 |
| Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment | Jan 1, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 2 |
| ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | May 18, 2023 | 1 Image, 2*2 StitchiAction Classification | CodeCode Available | 3 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 |
| Sigmoid Loss for Language Image Pre-Training | Mar 27, 2023 | Contrastive LearningDisentanglement | CodeCode Available | 3 |
| BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | Jan 30, 2023 | Generative Visual Question AnsweringImage Captioning | CodeCode Available | 4 |
| Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | Nov 2, 2022 | Contrastive Learningimage-classification | CodeCode Available | 5 |