| Dissecting Temporal Understanding in Text-to-Audio Retrieval | Sep 1, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| Do Audio-Language Models Understand Linguistic Variations? | Oct 21, 2024 | Contrastive LearningNatural Language Queries | —Unverified | 0 |
| Exploring Train and Test-Time Augmentations for Audio-Language Learning | Oct 31, 2022 | Audio captioningAudio to Text Retrieval | —Unverified | 0 |
| Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval | Oct 6, 2022 | Metric LearningRetrieval | —Unverified | 0 |
| The language of sound search: Examining User Queries in Audio Search Engines | Oct 10, 2024 | RetrievalSurvey | —Unverified | 0 |
| Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets | Aug 8, 2023 | RetrievalText to Audio Retrieval | CodeCode Available | 0 |
| M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP | Mar 28, 2025 | Audio captioningAudio Classification | CodeCode Available | 0 |
| Evaluation of pretrained language models on music understanding | Sep 17, 2024 | Music CaptioningNegation | CodeCode Available | 0 |
| Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval | Aug 21, 2024 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation | Jul 1, 2021 | Audio to Text RetrievalCross-Modal Retrieval | CodeCode Available | 0 |