| Can Audio Captions Be Evaluated with Image Caption Metrics? | Oct 10, 2021 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Prefix tuning for automated audio captioning | Mar 30, 2023 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Audio Captioning Transformer | Jul 21, 2021 | AudioCapsAudio captioning | CodeCode Available | 1 |
| LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport | Jan 16, 2025 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Separate What You Describe: Language-Queried Audio Source Separation | Mar 28, 2022 | AudioCapsAudio Source Separation | CodeCode Available | 1 |
| Target Sound Extraction with Variable Cross-modality Clues | Mar 15, 2023 | AudioCapsTarget Sound Extraction | CodeCode Available | 1 |
| Audiobox: Unified Audio Generation with Natural Language Prompts | Dec 25, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| FLAP: Fast Language-Audio Pre-training | Nov 2, 2023 | AudioCapsContrastive Learning | —Unverified | 0 |
| Dissecting Temporal Understanding in Text-to-Audio Retrieval | Sep 1, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap | Mar 15, 2025 | AudioCapsAudio Generation | —Unverified | 0 |