| Efficient Multilingual Multi-modal Pre-training through Triple Contrastive Loss | Oct 1, 2022 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples | Mar 5, 2024 | Concept AlignmentContrastive Learning | —Unverified | 0 | 0 |
| EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models | May 24, 2025 | Image-text RetrievalLanguage Modeling | —Unverified | 0 | 0 |
| EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE | Aug 23, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning | Dec 10, 2024 | Contrastive LearningImage-text Retrieval | —Unverified | 0 | 0 |
| Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach | Feb 10, 2025 | Federated LearningImage-text Retrieval | —Unverified | 0 | 0 |
| FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations | Apr 11, 2025 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks | Aug 13, 2023 | Contrastive Learningimage-classification | —Unverified | 0 | 0 |
| GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal downstream tasks | Jan 1, 2023 | Image GenerationImage-text Retrieval | —Unverified | 0 | 0 |
| Generative Negative Text Replay for Continual Vision-Language Pretraining | Oct 31, 2022 | Continual Learningimage-classification | —Unverified | 0 | 0 |