| ADIFF: Explaining audio difference using natural language | Feb 6, 2025 | AudioCapsAudio captioning | CodeCode Available | 1 |
| LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport | Jan 16, 2025 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Language-based Audio Retrieval with Co-Attention Networks | Dec 30, 2024 | AudioCapsLearning Semantic Representations | —Unverified | 0 |
| ETTA: Elucidating the Design Space of Text-to-Audio Models | Dec 26, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 |
| Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning | Oct 14, 2024 | AudioCapsAudio captioning | —Unverified | 0 |
| SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Oct 12, 2024 | AudioCapsAudio captioning | CodeCode Available | 0 |
| DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval | Sep 16, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance | Sep 2, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 |
| Dissecting Temporal Understanding in Text-to-Audio Retrieval | Sep 1, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval | Aug 21, 2024 | AudioCapsContrastive Learning | CodeCode Available | 0 |