| Text-to-Audio Generation Synchronized with Videos | Mar 8, 2024 | AudioCapsAudio Generation | —Unverified | 0 |
| Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model | Mar 12, 2025 | AudioCapsContrastive Learning | —Unverified | 0 |
| AC/DC: LLM-based Audio Comprehension via Dialogue Continuation | Jun 12, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer | Aug 20, 2023 | AudioCapsAudio captioning | —Unverified | 0 |
| Retrieval-Augmented Text-to-Audio Generation | Sep 14, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning | Feb 8, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| Audio-text Retrieval in Context | Mar 25, 2022 | AudioCapsRetrieval | —Unverified | 0 |
| Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval | Aug 21, 2024 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| Accommodating Audio Modality in CLIP for Multimodal Processing | Mar 12, 2023 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation | Jun 15, 2024 | AudioCapsImage Generation | CodeCode Available | 0 |
| AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS | Nov 15, 2021 | AudioCapsAudio captioning | CodeCode Available | 0 |
| SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Oct 12, 2024 | AudioCapsAudio captioning | CodeCode Available | 0 |
| ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors | Feb 20, 2025 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| Weakly-supervised Automated Audio Captioning via text only training | Sep 21, 2023 | AudioCapsAudio captioning | CodeCode Available | 0 |