| Captured by Captions: On Memorization and its Mitigation in CLIP Models | Feb 11, 2025 | Image RetrievalMemorization | —Unverified | 0 |
| DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions | Feb 7, 2025 | Anomaly DetectionImage-text Retrieval | —Unverified | 0 |
| LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation Models | Feb 6, 2025 | zero-shot-classificationZero-shot Generalization | CodeCode Available | 1 |
| Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding | Jan 24, 2025 | AnatomyContrastive Learning | CodeCode Available | 2 |
| Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jan 23, 2025 | Image RetrievalRetrieval | CodeCode Available | 0 |
| KPL: Training-Free Medical Knowledge Mining of Vision-Language Models | Jan 20, 2025 | Classificationimage-classification | CodeCode Available | 0 |
| FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing | Jan 14, 2025 | ClassificationContrastive Learning | —Unverified | 0 |
| BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature | Jan 13, 2025 | ArticlesImage-text Retrieval | CodeCode Available | 2 |
| A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI | Jan 8, 2025 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 |
| LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries | Jan 3, 2025 | Hallucinationzero-shot-classification | —Unverified | 0 |
| Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation | Jan 1, 2025 | Classificationcross-modal alignment | —Unverified | 0 |
| Cross-Modal 3D Representation with Multi-View Images and Point Clouds | Jan 1, 2025 | Autonomous DrivingCross-Modal Retrieval | —Unverified | 0 |
| Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio | Dec 23, 2024 | Contrastive LearningPrompt Learning | —Unverified | 0 |
| DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment | Dec 20, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 0 |
| Adaptive Pruning for Large Language Models with Structural Importance Awareness | Dec 19, 2024 | Text Generationzero-shot-classification | —Unverified | 0 |
| Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings | Dec 18, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| CRoF: CLIP-based Robust Few-shot Learning on Noisy Labels | Dec 17, 2024 | Domain GeneralizationFew-Shot Learning | —Unverified | 0 |
| A Simple and Efficient Baseline for Zero-Shot Generative Classification | Dec 17, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques | Dec 12, 2024 | Classificationimage-classification | CodeCode Available | 0 |
| SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting | Dec 11, 2024 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 1 |
| Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? | Dec 11, 2024 | Prompt Learningzero-shot-classification | CodeCode Available | 0 |
| Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning | Dec 10, 2024 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| S^3: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models | Dec 6, 2024 | zero-shot-classificationZero-shot Generalization | —Unverified | 0 |
| Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning | Dec 5, 2024 | Comment GenerationDecoder | CodeCode Available | 0 |
| Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | Dec 3, 2024 | ClassificationScene Classification | CodeCode Available | 0 |
| Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP | Dec 1, 2024 | Natural Language Understandingzero-shot-classification | CodeCode Available | 0 |
| Multimodal Whole Slide Foundation Model for Pathology | Nov 29, 2024 | Cross-Modal Retrievalmodel | CodeCode Available | 4 |
| CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections | Nov 28, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Active Data Curation Effectively Distills Large-Scale Multimodal Models | Nov 27, 2024 | DecoderImage Captioning | —Unverified | 0 |
| TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models | Nov 24, 2024 | Problem DecompositionTime Series | CodeCode Available | 1 |
| CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation | Nov 21, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 1 |
| CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Nov 15, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Measuring similarity between embedding spaces using induced neighborhood graphs | Nov 13, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | Nov 11, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| Enhancing Visual Classification using Comparative Descriptors | Nov 8, 2024 | Classificationzero-shot-classification | CodeCode Available | 0 |
| Asterisk*: Keep it Simple | Nov 8, 2024 | ClassificationKnowledge Distillation | —Unverified | 0 |
| RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Nov 6, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| ResiDual Transformer Alignment with Spectral Decomposition | Oct 31, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| Active Learning for Vision-Language Models | Oct 29, 2024 | Active Learningimage-classification | —Unverified | 0 |
| Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection | Oct 28, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models | Oct 24, 2024 | ClassificationIn-Context Learning | —Unverified | 0 |
| MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Oct 21, 2024 | DiagnosticMedical Diagnosis | CodeCode Available | 0 |
| Assessing Open-world Forgetting in Generative Image Model Customization | Oct 18, 2024 | Image Generationzero-shot-classification | —Unverified | 0 |
| Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data? | Oct 17, 2024 | zero-shot-classificationZero-Shot Learning | —Unverified | 0 |
| LLM Chain Ensembles for Scalable and Accurate Data Annotation | Oct 16, 2024 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| Interpreting and Analysing CLIP's Zero-Shot Image Classification via Mutual Knowledge | Oct 16, 2024 | Classificationimage-classification | CodeCode Available | 1 |
| CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning | Oct 15, 2024 | Image-text RetrievalText Retrieval | —Unverified | 0 |
| A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks | Oct 10, 2024 | FairnessImage Captioning | CodeCode Available | 0 |
| GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models | Oct 8, 2024 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |