| ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding | May 14, 2023 | 3D Classification3D Point Cloud Classification | CodeCode Available | 2 |
| Boosting Visual-Language Models by Exploiting Hard Samples | May 9, 2023 | Retrievalzero-shot-classification | CodeCode Available | 0 |
| The Benefits of Label-Description Training for Zero-Shot Text Classification | May 3, 2023 | Classificationdomain classification | CodeCode Available | 0 |
| Unsupervised Improvement of Audio-Text Cross-Modal Representations | May 3, 2023 | Acoustic Scene ClassificationClassification | CodeCode Available | 0 |
| The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks | Apr 26, 2023 | Data AugmentationLanguage Modelling | CodeCode Available | 1 |
| CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval | Apr 21, 2023 | Data AugmentationInformation Retrieval | CodeCode Available | 0 |
| WYTIWYR: A User Intent-Aware Framework with Multi-modal Inputs for Visualization Retrieval | Apr 14, 2023 | Retrievalzero-shot-classification | CodeCode Available | 0 |
| SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval) | Apr 13, 2023 | ClassificationSentiment Analysis | CodeCode Available | 1 |
| What does CLIP know about a red circle? Visual prompt engineering for VLMs | Apr 13, 2023 | Image GenerationPrompt Engineering | —Unverified | 0 |
| RECLIP: Resource-efficient CLIP by Training with Small Images | Apr 12, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |