| Audio Retrieval with Natural Language Queries | May 5, 2021 | AudioCapsAudio to Text Retrieval | CodeCode Available | 1 |
| AC/DC: LLM-based Audio Comprehension via Dialogue Continuation | Jun 12, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling | May 31, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion | May 28, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning | May 28, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap | Mar 15, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model | Mar 12, 2025 | AudioCapsContrastive Learning | —Unverified | 0 |
| TAIL: Text-Audio Incremental Learning | Mar 6, 2025 | AudioCapsIncremental Learning | —Unverified | 0 |
| ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors | Feb 20, 2025 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning | Feb 8, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| Language-based Audio Retrieval with Co-Attention Networks | Dec 30, 2024 | AudioCapsLearning Semantic Representations | —Unverified | 0 |
| Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning | Oct 14, 2024 | AudioCapsAudio captioning | —Unverified | 0 |
| SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs | Oct 12, 2024 | AudioCapsAudio captioning | CodeCode Available | 0 |
| DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval | Sep 16, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| Dissecting Temporal Understanding in Text-to-Audio Retrieval | Sep 1, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval | Aug 21, 2024 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval | Jun 22, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation | Jun 15, 2024 | AudioCapsImage Generation | CodeCode Available | 0 |
| Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval | Mar 15, 2024 | AudioCapsContrastive Learning | —Unverified | 0 |
| Text-to-Audio Generation Synchronized with Videos | Mar 8, 2024 | AudioCapsAudio Generation | —Unverified | 0 |
| CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing | Jan 22, 2024 | AudioCapsAudio-Visual Synchronization | —Unverified | 0 |
| Audiobox: Unified Audio Generation with Natural Language Prompts | Dec 25, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| Audio-Visual LLM for Video Understanding | Dec 11, 2023 | AudioCapsLanguage Modeling | —Unverified | 0 |
| FLAP: Fast Language-Audio Pre-training | Nov 2, 2023 | AudioCapsContrastive Learning | —Unverified | 0 |
| Generation or Replication: Auscultating Audio Latent Diffusion Models | Oct 16, 2023 | AudioCapsMemorization | —Unverified | 0 |