| Captured by Captions: On Memorization and its Mitigation in CLIP Models | Feb 11, 2025 | Image RetrievalMemorization | —Unverified | 0 |
| DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions | Feb 7, 2025 | Anomaly DetectionImage-text Retrieval | —Unverified | 0 |
| LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation Models | Feb 6, 2025 | zero-shot-classificationZero-shot Generalization | CodeCode Available | 1 |
| Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding | Jan 24, 2025 | AnatomyContrastive Learning | CodeCode Available | 2 |
| Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jan 23, 2025 | Image RetrievalRetrieval | CodeCode Available | 0 |
| KPL: Training-Free Medical Knowledge Mining of Vision-Language Models | Jan 20, 2025 | Classificationimage-classification | CodeCode Available | 0 |
| FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing | Jan 14, 2025 | ClassificationContrastive Learning | —Unverified | 0 |
| BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature | Jan 13, 2025 | ArticlesImage-text Retrieval | CodeCode Available | 2 |
| A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI | Jan 8, 2025 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 |