| It's Not a Modality Gap: Characterizing and Addressing the Contrastive Gap | May 28, 2024 | image-classificationImage Classification | —Unverified | 0 |
| MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding | May 28, 2024 | 3D Classification3D Object Recognition | —Unverified | 0 |
| Listenable Maps for Zero-Shot Audio Classifiers | May 27, 2024 | Decoderzero-shot-classification | —Unverified | 0 |
| CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification | May 26, 2024 | zero-shot-classificationZero-Shot Learning | CodeCode Available | 0 |
| BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection | May 24, 2024 | Contrastive LearningLanguage Modelling | —Unverified | 0 |
| What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | May 24, 2024 | Classificationimage-classification | CodeCode Available | 0 |
| Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations | May 23, 2024 | Contrastive LearningInstance Segmentation | CodeCode Available | 0 |
| Tuning-free Universally-Supervised Semantic Segmentation | May 23, 2024 | SegmentationSemantic Segmentation | —Unverified | 0 |
| CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | May 14, 2024 | Depth Estimationobject-detection | —Unverified | 0 |
| Stylometric Watermarks for Large Language Models | May 14, 2024 | Sentencezero-shot-classification | —Unverified | 0 |