| Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | Nov 2, 2022 | Contrastive Learningimage-classification | CodeCode Available | 5 |
| AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | Nov 12, 2022 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 4 |
| Cross-lingual and Multilingual CLIP | Jun 1, 2022 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| FLAVA: A Foundational Language And Vision Alignment Model | Dec 8, 2021 | Image RetrievalImage-to-Text Retrieval | CodeCode Available | 1 |
| Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval | Sep 28, 2023 | AttributeImage Retrieval | CodeCode Available | 1 |
| FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing | May 27, 2023 | Graph SimilarityHuman Judgment Correlation | CodeCode Available | 1 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 |
| General Image Descriptors for Open World Image Retrieval using ViT CLIP | Oct 20, 2022 | Image RetrievalRetrieval | CodeCode Available | 1 |
| InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | Dec 21, 2023 | Image RetrievalImage-to-Text Retrieval | CodeCode Available | 1 |
| Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval | Feb 6, 2023 | AttributeComposed Image Retrieval (CoIR) | CodeCode Available | 1 |
| CCMB: A Large-scale Chinese Cross-modal Benchmark | May 8, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Energy Confused Adversarial Metric Learning for Zero-Shot Image Retrieval and Clustering | Jan 22, 2019 | ClusteringImage Retrieval | —Unverified | 0 |
| Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval | May 16, 2021 | Graph GenerationImage Captioning | —Unverified | 0 |
| Learning with Succinct Common Representation Based on Wyner's Common Information | May 27, 2019 | Density Ratio EstimationImage Retrieval | —Unverified | 0 |
| Zero-Shot Hashing via Transferring Supervised Knowledge | Jun 16, 2016 | Image RetrievalRetrieval | —Unverified | 0 |
| Piecewise-Linear Manifolds for Deep Metric Learning | Mar 22, 2024 | Image RetrievalMetric Learning | —Unverified | 0 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Attribute-Modulated Generative Meta Learning for Zero-Shot Classification | Apr 22, 2021 | AttributeClassification | —Unverified | 0 |
| Curriculum Learning for Data-Efficient Vision-Language Alignment | Jul 29, 2022 | Contrastive LearningImage Retrieval | —Unverified | 0 |
| Attribute-Guided Network for Cross-Modal Zero-Shot Hashing | Feb 6, 2018 | AttributeCross-Modal Retrieval | —Unverified | 0 |
| Full-attention based Neural Architecture Search using Context Auto-regression | Nov 13, 2021 | Fine-Grained Image Recognitionimage-classification | —Unverified | 0 |
| GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training | Aug 22, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Hybrid-Attention based Decoupled Metric Learning for Zero-Shot Image Retrieval | Jul 27, 2019 | Image RetrievalMetric Learning | —Unverified | 0 |
| Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime | Jan 22, 2022 | Few-Shot Image Classificationimage-classification | CodeCode Available | 0 |
| ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training | Sep 30, 2022 | Computational EfficiencyContrastive Learning | CodeCode Available | 0 |
| Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jan 23, 2025 | Image RetrievalRetrieval | CodeCode Available | 0 |
| Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | Feb 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers | Jan 31, 2021 | Image RetrievalRetrieval | CodeCode Available | 0 |
| M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining | Jan 29, 2024 | GPUzero-shot-classification | CodeCode Available | 0 |