| Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls | Feb 14, 2024 | Audio GenerationMusic Generation | CodeCode Available | 1 |
| Fast Timing-Conditioned Latent Audio Diffusion | Feb 7, 2024 | Audio GenerationGPU | CodeCode Available | 7 |
| Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization | Feb 3, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Bass Accompaniment Generation via Latent Diffusion | Feb 2, 2024 | Audio Generation | —Unverified | 0 |
| EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks | Jan 31, 2024 | Audio GenerationSpeech Synthesis | CodeCode Available | 2 |
| T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis | Jan 17, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 1 |
| ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering | Jan 14, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Masked Audio Generation using a Single Non-Autoregressive Transformer | Jan 9, 2024 | Audio Generation | —Unverified | 0 |
| Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation | Jan 2, 2024 | Audio Generationcross-modal alignment | CodeCode Available | 2 |
| Efficient Parallel Audio Generation using Group Masked Language Modeling | Jan 2, 2024 | Audio GenerationComputational Efficiency | —Unverified | 0 |
| Cyclic Learning for Binaural Audio Generation and Localization | Jan 1, 2024 | Audio GenerationObject | —Unverified | 0 |
| Audiobox: Unified Audio Generation with Natural Language Prompts | Dec 25, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models | Dec 24, 2023 | Audio GenerationDenoising | —Unverified | 0 |
| CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling | Dec 8, 2023 | Audio Generation | —Unverified | 0 |
| SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement | Dec 4, 2023 | Audio GenerationSpeech Enhancement | —Unverified | 0 |
| tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models | Nov 24, 2023 | Audio GenerationEvent Detection | —Unverified | 0 |
| Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation | Nov 13, 2023 | AttributeAudio Generation | —Unverified | 0 |
| On The Open Prompt Challenge In Conditional Audio Generation | Nov 1, 2023 | Audio Generation | —Unverified | 0 |
| In-Context Prompt Editing For Conditional Audio Generation | Nov 1, 2023 | Audio GenerationRetrieval | —Unverified | 0 |
| Audio Editing with Non-Rigid Text Prompts | Oct 19, 2023 | Audio GenerationStyle Transfer | —Unverified | 0 |
| Speech collage: code-switched audio generation by collaging monolingual corpora | Sep 27, 2023 | Audio GenerationAutomatic Speech Recognition | CodeCode Available | 1 |
| Invisible Watermarking for Audio Generation Diffusion Models | Sep 22, 2023 | Audio Generation | CodeCode Available | 1 |
| FoleyGen: Visually-Guided Audio Generation | Sep 19, 2023 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation | Sep 19, 2023 | AudioCapsAudio Generation | CodeCode Available | 1 |
| Enhance audio generation controllability through representation similarity regularization | Sep 15, 2023 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Retrieval-Augmented Text-to-Audio Generation | Sep 14, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics | Sep 3, 2023 | Audio Generation | —Unverified | 0 |
| WavMark: Watermarking for Audio Generation | Aug 24, 2023 | Audio Generation | CodeCode Available | 2 |
| Audio Generation with Multiple Conditional Diffusion Model | Aug 23, 2023 | Audio GenerationDiversity | —Unverified | 0 |
| An Initial Exploration: Learning to Generate Realistic Audio for Silent Video | Aug 23, 2023 | Audio Generation | CodeCode Available | 0 |
| V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models | Aug 18, 2023 | Audio GenerationVideo-to-Sound Generation | CodeCode Available | 1 |
| AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining | Aug 10, 2023 | Audio GenerationIn-Context Learning | CodeCode Available | 4 |
| MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies | Aug 3, 2023 | Audio GenerationBeat Tracking | CodeCode Available | 1 |
| WavJourney: Compositional Audio Creation with Large Language Models | Jul 26, 2023 | Audio Generation | CodeCode Available | 2 |
| IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models | Jul 24, 2023 | Audio GenerationMusic Generation | —Unverified | 0 |
| A Demand-Driven Perspective on Generative Audio AI | Jul 10, 2023 | Audio GenerationSurvey | —Unverified | 0 |
| LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models | Jun 18, 2023 | Audio GenerationDisentanglement | —Unverified | 0 |
| High-Fidelity Audio Compression with Improved RVQGAN | Jun 11, 2023 | Audio CompressionAudio Generation | CodeCode Available | 3 |
| MuseCoco: Generating Symbolic Music from Text | May 31, 2023 | AttributeAudio Generation | CodeCode Available | 0 |
| Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation | May 29, 2023 | Audio GenerationDenoising | CodeCode Available | 1 |
| An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization | May 26, 2023 | Audio GenerationInference Attack | CodeCode Available | 1 |
| DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment | May 22, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| Any-to-Any Generation via Composable Diffusion | May 19, 2023 | Audio Generation | CodeCode Available | 1 |
| SoundStorm: Efficient Parallel Audio Generation | May 16, 2023 | Audio Generation | CodeCode Available | 6 |
| LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music | May 1, 2023 | Audio GenerationInformation Retrieval | CodeCode Available | 1 |
| Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | Apr 24, 2023 | AudioCapsAudio Generation | CodeCode Available | 3 |
| Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert | Apr 18, 2023 | Audio GenerationExpressive Speech Synthesis | CodeCode Available | 4 |
| Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation | Mar 29, 2023 | Audio GenerationContrastive Learning | CodeCode Available | 0 |
| Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study | Mar 7, 2023 | Audio GenerationBenchmarking | —Unverified | 0 |
| AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis | Feb 4, 2023 | 3D geometryAudio Generation | CodeCode Available | 1 |