| On Metric Learning for Audio-Text Cross-Modal Retrieval | Mar 29, 2022 | AudioCapsCross-Modal Retrieval | CodeCode Available | 1 |
| Separate What You Describe: Language-Queried Audio Source Separation | Mar 28, 2022 | AudioCapsAudio Source Separation | CodeCode Available | 1 |
| Audio Retrieval with Natural Language Queries: A Benchmark Study | Dec 17, 2021 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Can Audio Captions Be Evaluated with Image Caption Metrics? | Oct 10, 2021 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Audio Captioning Transformer | Jul 21, 2021 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Audio Retrieval with Natural Language Queries | May 5, 2021 | AudioCapsAudio to Text Retrieval | CodeCode Available | 1 |
| AC/DC: LLM-based Audio Comprehension via Dialogue Continuation | Jun 12, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling | May 31, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning | May 28, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion | May 28, 2025 | AudioCapsAudio Generation | —Unverified | 0 |