| ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning | Feb 27, 2025 | Cross-Modal RetrievalCross-modal retrieval with noisy correspondence | CodeCode Available | 1 |
| Progressive Local Alignment for Medical Multimodal Pre-training | Feb 25, 2025 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Feb 20, 2025 | FairnessImage-text Retrieval | CodeCode Available | 0 |
| Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach | Feb 10, 2025 | Federated LearningImage-text Retrieval | —Unverified | 0 |
| Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding | Feb 9, 2025 | Image CaptioningImage-text Retrieval | CodeCode Available | 3 |
| DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions | Feb 7, 2025 | Anomaly DetectionImage-text Retrieval | —Unverified | 0 |
| MASS: Overcoming Language Bias in Image-Text Matching | Jan 20, 2025 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval | Jan 19, 2025 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature | Jan 13, 2025 | ArticlesImage-text Retrieval | CodeCode Available | 2 |
| Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training | Jan 1, 2025 | Image-text RetrievalImage to text | —Unverified | 0 |