| Progressive Upsampling Audio Synthesis via Effective Adversarial Training | Sep 25, 2019 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Prompt-guided Precise Audio Editing with Diffusion Models | May 11, 2024 | Audio Generation | —Unverified | 0 |
| Provable Statistical Rates for Consistency Diffusion Models | Jun 23, 2024 | Audio Generation | —Unverified | 0 |
| PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models | Sep 20, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| ReelWave: Multi-Agentic Movie Sound Generation through Multimodal LLM Conversation | Mar 10, 2025 | Audio Generation | —Unverified | 0 |
| Retrieval-Augmented Text-to-Audio Generation | Sep 14, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners | Feb 27, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement | Dec 4, 2023 | Audio GenerationSpeech Enhancement | —Unverified | 0 |
| Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation | Jul 20, 2020 | Audio Generation | —Unverified | 0 |
| SingSong: Generating musical accompaniments from singing | Jan 30, 2023 | Audio GenerationRetrieval | —Unverified | 0 |
| Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance | Dec 24, 2024 | Audio GenerationVideo Alignment | —Unverified | 0 |
| SOAF: Scene Occlusion-aware Neural Acoustic Field | Jul 2, 2024 | Audio Generation | —Unverified | 0 |
| Soundify: Matching Sound Effects to Video | Dec 17, 2021 | Audio GenerationImage Classification | —Unverified | 0 |
| Sounding that Object: Interactive Object-Aware Image to Audio Generation | Jun 4, 2025 | Audio GenerationImage Segmentation | —Unverified | 0 |
| Speech Audio Generation from dynamic MRI via a Knowledge Enhanced Conditional Variational Autoencoder | Mar 9, 2025 | Audio GenerationDenoising | —Unverified | 0 |
| YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls | Dec 12, 2024 | Audio Generation | —Unverified | 0 |
| Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models | Apr 6, 2025 | Audio GenerationGPU | —Unverified | 0 |
| A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions | Nov 13, 2020 | Audio GenerationMusic Generation | —Unverified | 0 |
| Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos | Jun 13, 2024 | Audio GenerationRetrieval-augmented Generation | —Unverified | 0 |
| ADD 2022: the First Audio Deep Synthesis Detection Challenge | Feb 17, 2022 | Audio Deepfake DetectionAudio Generation | —Unverified | 0 |
| A Demand-Driven Perspective on Generative Audio AI | Jul 10, 2023 | Audio GenerationSurvey | —Unverified | 0 |
| Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics | Sep 3, 2023 | Audio Generation | —Unverified | 0 |
| Adversarial Audio Synthesis with Complex-valued Polynomial Networks | Jun 14, 2022 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models | Sep 28, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization | Feb 3, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Animate and Sound an Image | Jan 1, 2025 | Audio Generation | —Unverified | 0 |
| An investigation of pre-upsampling generative modelling and Generative Adversarial Networks in audio super resolution | Sep 30, 2021 | Audio GenerationAudio Super-Resolution | —Unverified | 0 |
| Applications and Advances of Artificial Intelligence in Music Generation:A Review | Sep 3, 2024 | Audio GenerationMusic Generation | —Unverified | 0 |
| A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations | Jun 6, 2025 | Audio GenerationText Generation | —Unverified | 0 |
| A Survey of Deep Learning Audio Generation Methods | May 31, 2024 | Audio GenerationDeep Learning | —Unverified | 0 |
| Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition | Oct 4, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation | Nov 7, 2024 | Audio GenerationLarge Language Model | —Unverified | 0 |
| Audiobox: Unified Audio Generation with Natural Language Prompts | Dec 25, 2023 | AudioCapsAudio Generation | —Unverified | 0 |
| AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions | Sep 19, 2024 | Audio Generation | —Unverified | 0 |
| Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder | Aug 16, 2020 | Audio DequantizationAudio Generation | —Unverified | 0 |
| Audio Editing with Non-Rigid Text Prompts | Oct 19, 2023 | Audio GenerationStyle Transfer | —Unverified | 0 |
| Audio Generation with Multiple Conditional Diffusion Model | Aug 23, 2023 | Audio GenerationDiversity | —Unverified | 0 |
| AudioSpa: Spatializing Sound Events with Text | Feb 16, 2025 | Audio GenerationData Augmentation | —Unverified | 0 |
| AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion | May 28, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| AudioX: Diffusion Transformer for Anything-to-Audio Generation | Mar 13, 2025 | Audio GenerationMusic Generation | —Unverified | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 |
| AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation | Jun 11, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| Bass Accompaniment Generation via Latent Diffusion | Feb 2, 2024 | Audio Generation | —Unverified | 0 |
| Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models | Mar 2, 2024 | Audio GenerationConditional Image Generation | —Unverified | 0 |
| Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation | Oct 14, 2024 | Audio Generationmultimodal generation | —Unverified | 0 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| C3LLM: Conditional Multimodal Content Generation Using Large Language Models | May 25, 2024 | Audio GenerationLanguage Modelling | —Unverified | 0 |
| CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation | Jan 6, 2025 | Audio GenerationContrastive Learning | —Unverified | 0 |
| Classification Diffusion Models: Revitalizing Density Ratio Estimation | Feb 15, 2024 | Audio GenerationClassification | —Unverified | 0 |
| CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation | Nov 22, 2017 | Audio GenerationGenerative Adversarial Network | —Unverified | 0 |