| Can Audio Captions Be Evaluated with Image Caption Metrics? | Oct 10, 2021 | AudioCapsAudio captioning | CodeCode Available | 1 | 5 |
| Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates | Nov 14, 2022 | AudioCapsAudio captioning | CodeCode Available | 1 | 5 |
| Target Sound Extraction with Variable Cross-modality Clues | Mar 15, 2023 | AudioCapsTarget Sound Extraction | CodeCode Available | 1 | 5 |
| Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention | Oct 28, 2022 | AudioCapsAudio captioning | CodeCode Available | 1 | 5 |
| Audio Retrieval with Natural Language Queries | May 5, 2021 | AudioCapsAudio to Text Retrieval | CodeCode Available | 1 | 5 |
| Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation | May 16, 2024 | AudioCapsEvent Detection | CodeCode Available | 1 | 5 |
| Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval | Aug 21, 2024 | AudioCapsContrastive Learning | CodeCode Available | 0 | 5 |
| ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors | Feb 20, 2025 | AudioCapsContrastive Learning | CodeCode Available | 0 | 5 |
| SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Oct 12, 2024 | AudioCapsAudio captioning | CodeCode Available | 0 | 5 |
| MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation | Jun 15, 2024 | AudioCapsImage Generation | CodeCode Available | 0 | 5 |