| CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs | Aug 19, 2024 | Hallucinationzero-shot-classification | —Unverified | 0 |
| Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning | May 6, 2025 | Representation Learningzero-shot-classification | —Unverified | 0 |
| Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision | Mar 19, 2024 | Cross-corpusEmotion Recognition | —Unverified | 0 |
| Compressing Unknown Images With Product Quantizer for Efficient Zero-Shot Classification | Jun 1, 2019 | General ClassificationGeneralized Zero-Shot Learning | —Unverified | 0 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| Image Classification Using a Diffusion Model as a Pre-Training Model | May 11, 2025 | Contrastive Learningimage-classification | —Unverified | 0 |
| Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning | Sep 20, 2023 | Audio captioningCaption Generation | —Unverified | 0 |
| Exploiting the Textual Potential from Vision-Language Pre-training for Text-based Person Search | Mar 8, 2023 | AttributePerson Search | —Unverified | 0 |
| CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination | Aug 18, 2024 | Knowledge DistillationTransfer Learning | —Unverified | 0 |
| Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval | Sep 21, 2023 | Retrievalzero-shot-classification | —Unverified | 0 |