| ETTA: Elucidating the Design Space of Text-to-Audio Models | Dec 26, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 | 5 |
| WavMark: Watermarking for Audio Generation | Aug 24, 2023 | Audio Generation | CodeCode Available | 2 | 5 |
| Taming Data and Transformers for Audio Generation | Jun 27, 2024 | Audio captioningAudio Generation | CodeCode Available | 2 | 5 |
| KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation | Feb 21, 2025 | Audio GenerationFAD | CodeCode Available | 2 | 5 |
| RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer | Jan 2, 2025 | Audio Generationtext-to-speech | CodeCode Available | 2 | 5 |
| Unsupervised Source Separation By Steering Pretrained Music Models | Oct 25, 2021 | Audio GenerationAudio Source Separation | CodeCode Available | 1 | 5 |
| Adversarial Audio Synthesis | Feb 12, 2018 | Audio GenerationAudio Synthesis | CodeCode Available | 1 | 5 |
| V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models | Aug 18, 2023 | Audio GenerationVideo-to-Sound Generation | CodeCode Available | 1 | 5 |
| ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation | Sep 19, 2023 | AudioCapsAudio Generation | CodeCode Available | 1 | 5 |
| RiTTA: Modeling Event Relations in Text-to-Audio Generation | Dec 20, 2024 | Audio GenerationRelation | CodeCode Available | 1 | 5 |
| BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation | Jun 11, 2025 | Audio GenerationFAD | CodeCode Available | 1 | 5 |
| Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization | May 18, 2020 | Audio GenerationGenerative Adversarial Network | CodeCode Available | 1 | 5 |
| Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls | Feb 14, 2024 | Audio GenerationMusic Generation | CodeCode Available | 1 | 5 |
| WaveNet: A Generative Model for Raw Audio | Sep 12, 2016 | Audio Generationmodel | CodeCode Available | 1 | 5 |
| AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis | Feb 4, 2023 | 3D geometryAudio Generation | CodeCode Available | 1 | 5 |
| Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion | Jun 3, 2019 | Audio GenerationVoice Conversion | CodeCode Available | 1 | 5 |
| ADIFF: Explaining audio difference using natural language | Feb 6, 2025 | AudioCapsAudio captioning | CodeCode Available | 1 | 5 |
| Any-to-Any Generation via Composable Diffusion | May 19, 2023 | Audio Generation | CodeCode Available | 1 | 5 |
| Read, Watch and Scream! Sound Generation from Text and Video | Jul 8, 2024 | Audio GenerationTriplet | CodeCode Available | 1 | 5 |
| Perceiving Music Quality with GANs | Jun 11, 2020 | Audio GenerationAudio Quality Assessment | CodeCode Available | 1 | 5 |
| Anytime Sampling for Autoregressive Models via Ordered Autoencoding | Feb 23, 2021 | Audio GenerationComputational Efficiency | CodeCode Available | 1 | 5 |
| Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training | Dec 3, 2020 | Audio GenerationDisentanglement | CodeCode Available | 1 | 5 |
| RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses | Nov 1, 2021 | Audio GenerationGenerative Adversarial Network | CodeCode Available | 1 | 5 |
| T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis | Jan 17, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 1 | 5 |
| Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus | Dec 20, 2021 | Audio GenerationSinging Voice Synthesis | CodeCode Available | 1 | 5 |
| Differentiable Time-Frequency Scattering on GPU | Apr 18, 2022 | Audio GenerationCPU | CodeCode Available | 1 | 5 |
| MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Jul 30, 2024 | Audio GenerationImage to Video Generation | CodeCode Available | 1 | 5 |
| Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation | May 29, 2023 | Audio GenerationDenoising | CodeCode Available | 1 | 5 |
| Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception | Apr 9, 2025 | AllAudio Deepfake Detection | CodeCode Available | 1 | 5 |
| MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies | Aug 3, 2023 | Audio GenerationBeat Tracking | CodeCode Available | 1 | 5 |
| Tell What You Hear From What You See -- Video to Audio Generation Through Text | Nov 8, 2024 | Audio captioningAudio Generation | CodeCode Available | 1 | 5 |
| Localize to Binauralize: Audio Spatialization From Visual Sound Source Localization | Jan 1, 2021 | Audio GenerationSound Source Localization | CodeCode Available | 1 | 5 |
| LLMBind: A Unified Modality-Task Integration Framework | Feb 22, 2024 | AI AgentAudio Generation | CodeCode Available | 1 | 5 |
| An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization | May 26, 2023 | Audio GenerationInference Attack | CodeCode Available | 1 | 5 |
| LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music | May 1, 2023 | Audio GenerationInformation Retrieval | CodeCode Available | 1 | 5 |
| LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation | Jun 12, 2024 | Audio Generation | CodeCode Available | 1 | 5 |
| LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis | Jul 15, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 1 | 5 |
| Taming Visually Guided Sound Generation | Oct 17, 2021 | Audio GenerationGPU | CodeCode Available | 1 | 5 |
| Temporally Aligned Audio for Video with Autoregression | Sep 20, 2024 | Audio GenerationVideo-to-Sound Generation | CodeCode Available | 1 | 5 |
| HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement | Mar 24, 2022 | Audio GenerationBandwidth Extension | CodeCode Available | 1 | 5 |
| GACELA -- A generative adversarial context encoder for long audio inpainting | May 11, 2020 | Audio GenerationAudio inpainting | CodeCode Available | 1 | 5 |
| Invisible Watermarking for Audio Generation Diffusion Models | Sep 22, 2023 | Audio Generation | CodeCode Available | 1 | 5 |
| It's Raw! Audio Generation with State-Space Models | Feb 20, 2022 | Audio GenerationDensity Estimation | CodeCode Available | 1 | 5 |
| From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation | Sep 27, 2024 | Audio ClassificationAudio Generation | CodeCode Available | 1 | 5 |
| Speech collage: code-switched audio generation by collaging monolingual corpora | Sep 27, 2023 | Audio GenerationAutomatic Speech Recognition | CodeCode Available | 1 | 5 |
| Neural Waveshaping Synthesis | Jul 11, 2021 | Audio GenerationAudio Synthesis | CodeCode Available | 1 | 5 |
| Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation | Aug 2, 2024 | AttributeAudio Generation | CodeCode Available | 1 | 5 |
| Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization | Mar 28, 2025 | Audio GenerationFAD | CodeCode Available | 1 | 5 |
| Audeo: Audio Generation for a Silent Performance Video | Jun 23, 2020 | Audio GenerationAudio Synthesis | CodeCode Available | 1 | 5 |
| Catch-A-Waveform: Learning to Generate Audio from a Single Short Example | Jun 11, 2021 | Audio GenerationSemantic Similarity | CodeCode Available | 1 | 5 |