| AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection | Apr 28, 2025 | Adversarial AttackAnomaly Detection | —Unverified | 0 |
| Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs | Apr 24, 2025 | Image-text RetrievalInstruction Following | —Unverified | 0 |
| FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations | Apr 11, 2025 | image-classificationImage Classification | —Unverified | 0 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI | Mar 25, 2025 | Contrastive LearningImage Segmentation | —Unverified | 0 |
| Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis | Mar 25, 2025 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| Anatomy-Aware Conditional Image-Text Retrieval | Mar 10, 2025 | AnatomyContrastive Learning | —Unverified | 0 |
| Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings | Mar 5, 2025 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning | Mar 4, 2025 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations | Mar 2, 2025 | image-classificationImage Classification | —Unverified | 0 |