| MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation | Jun 15, 2024 | AudioCapsImage Generation | CodeCode Available | 0 | 5 |
| Accommodating Audio Modality in CLIP for Multimodal Processing | Mar 12, 2023 | AudioCapsContrastive Learning | CodeCode Available | 0 | 5 |
| AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS | Nov 15, 2021 | AudioCapsAudio captioning | CodeCode Available | 0 | 5 |
| Audiobox: Unified Audio Generation with Natural Language Prompts | Dec 25, 2023 | AudioCapsAudio Generation | —Unverified | 0 | 0 |
| IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling | May 31, 2025 | AudioCapsAudio Generation | —Unverified | 0 | 0 |
| Joint Speech Recognition and Audio Captioning | Feb 3, 2022 | AudioCapsAudio captioning | —Unverified | 0 | 0 |
| Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval? | Aug 29, 2023 | AudioCapsAudio captioning | —Unverified | 0 | 0 |
| Language-based Audio Retrieval with Co-Attention Networks | Dec 30, 2024 | AudioCapsLearning Semantic Representations | —Unverified | 0 | 0 |
| TAIL: Text-Audio Incremental Learning | Mar 6, 2025 | AudioCapsIncremental Learning | —Unverified | 0 | 0 |
| Leveraging Pre-trained BERT for Audio Captioning | Mar 6, 2022 | AudioCapsAudio captioning | —Unverified | 0 | 0 |