| Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval | Aug 21, 2024 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval | Jun 22, 2024 | AudioCapsRetrieval | —Unverified | 0 |
| MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation | Jun 15, 2024 | AudioCapsImage Generation | CodeCode Available | 0 |
| Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval | Mar 15, 2024 | AudioCapsContrastive Learning | —Unverified | 0 |
| Text-to-Audio Generation Synchronized with Videos | Mar 8, 2024 | AudioCapsAudio Generation | —Unverified | 0 |
| CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing | Jan 22, 2024 | AudioCapsAudio-Visual Synchronization | —Unverified | 0 |
| Audiobox: Unified Audio Generation with Natural Language Prompts | Dec 25, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| Audio-Visual LLM for Video Understanding | Dec 11, 2023 | AudioCapsLanguage Modeling | —Unverified | 0 |
| FLAP: Fast Language-Audio Pre-training | Nov 2, 2023 | AudioCapsContrastive Learning | —Unverified | 0 |
| Generation or Replication: Auscultating Audio Latent Diffusion Models | Oct 16, 2023 | AudioCapsMemorization | —Unverified | 0 |