| ImageBind: One Embedding Space To Bind Them All | May 9, 2023 | AllCross-Modal Retrieval | CodeCode Available | 5 |
| LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment | Oct 3, 2023 | Audio ClassificationContrastive Learning | CodeCode Available | 4 |
| WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research | Mar 30, 2023 | Audio captioningEvent Detection | CodeCode Available | 2 |
| ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds | Sep 13, 2024 | Audio ClassificationDescriptive | CodeCode Available | 1 |
| Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer | Dec 16, 2021 | Audio ClassificationAudio Tagging | CodeCode Available | 1 |
| Sound-Guided Semantic Image Manipulation | Nov 30, 2021 | Audio Classificationimage-classification | CodeCode Available | 1 |
| TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification | Dec 31, 2024 | Audio ClassificationClassification | —Unverified | 0 |
| A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification | Sep 19, 2024 | Audio ClassificationClassification | —Unverified | 0 |
| Multi-label Zero-Shot Audio Classification with Temporal Attention | Aug 31, 2024 | Audio ClassificationClassification | —Unverified | 0 |
| Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs | Aug 17, 2024 | Audio ClassificationContrastive Learning | CodeCode Available | 0 |