| Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer | Aug 20, 2023 | AudioCapsAudio captioning | —Unverified | 0 |
| DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment | May 22, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | May 18, 2023 | 1 Image, 2*2 StitchiAction Classification | CodeCode Available | 3 |
| Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | Apr 24, 2023 | AudioCapsAudio Generation | CodeCode Available | 3 |
| Prefix tuning for automated audio captioning | Mar 30, 2023 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Target Sound Extraction with Variable Cross-modality Clues | Mar 15, 2023 | AudioCapsTarget Sound Extraction | CodeCode Available | 1 |
| Accommodating Audio Modality in CLIP for Multimodal Processing | Mar 12, 2023 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| AudioLDM: Text-to-Audio Generation with Latent Diffusion Models | Jan 29, 2023 | AudioCapsAudio Generation | CodeCode Available | 4 |
| Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates | Nov 14, 2022 | AudioCapsAudio captioning | CodeCode Available | 1 |
| Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention | Oct 28, 2022 | AudioCapsAudio captioning | CodeCode Available | 1 |