| Progressive Local Alignment for Medical Multimodal Pre-training | Feb 25, 2025 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Feb 20, 2025 | FairnessImage-text Retrieval | —Unverified | 0 |
| Using tournaments to calculate AUROC for zero-shot classification with LLMs | Feb 20, 2025 | Binary ClassificationClassification | —Unverified | 0 |
| Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Feb 19, 2025 | Caption GenerationClassification | —Unverified | 0 |
| Text Classification in the LLM Era - Where do we stand? | Feb 17, 2025 | ClassificationSentiment Analysis | —Unverified | 0 |
| Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering | Feb 13, 2025 | ClassificationPrompt Engineering | —Unverified | 0 |
| From Haystack to Needle: Label Space Reduction for Zero-shot Classification | Feb 12, 2025 | Classificationzero-shot-classification | —Unverified | 0 |
| Captured by Captions: On Memorization and its Mitigation in CLIP Models | Feb 11, 2025 | Image RetrievalMemorization | —Unverified | 0 |
| DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions | Feb 7, 2025 | Anomaly DetectionImage-text Retrieval | —Unverified | 0 |
| Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jan 23, 2025 | Image RetrievalRetrieval | CodeCode Available | 0 |
| KPL: Training-Free Medical Knowledge Mining of Vision-Language Models | Jan 20, 2025 | Classificationimage-classification | CodeCode Available | 0 |
| FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing | Jan 14, 2025 | ClassificationContrastive Learning | —Unverified | 0 |
| A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI | Jan 8, 2025 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries | Jan 3, 2025 | Hallucinationzero-shot-classification | —Unverified | 0 |
| Cross-Modal 3D Representation with Multi-View Images and Point Clouds | Jan 1, 2025 | Autonomous DrivingCross-Modal Retrieval | —Unverified | 0 |
| Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation | Jan 1, 2025 | Classificationcross-modal alignment | —Unverified | 0 |
| Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio | Dec 23, 2024 | Contrastive LearningPrompt Learning | —Unverified | 0 |
| DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment | Dec 20, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 0 |
| Adaptive Pruning for Large Language Models with Structural Importance Awareness | Dec 19, 2024 | Text Generationzero-shot-classification | —Unverified | 0 |
| Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings | Dec 18, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| CRoF: CLIP-based Robust Few-shot Learning on Noisy Labels | Dec 17, 2024 | Domain GeneralizationFew-Shot Learning | —Unverified | 0 |
| A Simple and Efficient Baseline for Zero-Shot Generative Classification | Dec 17, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques | Dec 12, 2024 | Classificationimage-classification | CodeCode Available | 0 |
| Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? | Dec 11, 2024 | Prompt Learningzero-shot-classification | CodeCode Available | 0 |
| Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning | Dec 10, 2024 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |