| ADD 2022: the First Audio Deep Synthesis Detection Challenge | Feb 17, 2022 | Audio Deepfake DetectionAudio Generation | —Unverified | 0 | 0 |
| A Demand-Driven Perspective on Generative Audio AI | Jul 10, 2023 | Audio GenerationSurvey | —Unverified | 0 | 0 |
| Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics | Sep 3, 2023 | Audio Generation | —Unverified | 0 | 0 |
| Adversarial Audio Synthesis with Complex-valued Polynomial Networks | Jun 14, 2022 | Audio GenerationAudio Synthesis | —Unverified | 0 | 0 |
| Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models | Sep 28, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization | Feb 3, 2024 | Audio GenerationDenoising | —Unverified | 0 | 0 |
| Animate and Sound an Image | Jan 1, 2025 | Audio Generation | —Unverified | 0 | 0 |
| An investigation of pre-upsampling generative modelling and Generative Adversarial Networks in audio super resolution | Sep 30, 2021 | Audio GenerationAudio Super-Resolution | —Unverified | 0 | 0 |
| Applications and Advances of Artificial Intelligence in Music Generation:A Review | Sep 3, 2024 | Audio GenerationMusic Generation | —Unverified | 0 | 0 |
| A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations | Jun 6, 2025 | Audio GenerationText Generation | —Unverified | 0 | 0 |
| A Survey of Deep Learning Audio Generation Methods | May 31, 2024 | Audio GenerationDeep Learning | —Unverified | 0 | 0 |
| Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition | Oct 4, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation | Nov 7, 2024 | Audio GenerationLarge Language Model | —Unverified | 0 | 0 |
| Audiobox: Unified Audio Generation with Natural Language Prompts | Dec 25, 2023 | AudioCapsAudio Generation | —Unverified | 0 | 0 |
| AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions | Sep 19, 2024 | Audio Generation | —Unverified | 0 | 0 |
| Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder | Aug 16, 2020 | Audio DequantizationAudio Generation | —Unverified | 0 | 0 |
| Audio Editing with Non-Rigid Text Prompts | Oct 19, 2023 | Audio GenerationStyle Transfer | —Unverified | 0 | 0 |
| Audio Generation with Multiple Conditional Diffusion Model | Aug 23, 2023 | Audio GenerationDiversity | —Unverified | 0 | 0 |
| AudioSpa: Spatializing Sound Events with Text | Feb 16, 2025 | Audio GenerationData Augmentation | —Unverified | 0 | 0 |
| AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion | May 28, 2025 | AudioCapsAudio Generation | —Unverified | 0 | 0 |
| AudioX: Diffusion Transformer for Anything-to-Audio Generation | Mar 13, 2025 | Audio GenerationMusic Generation | —Unverified | 0 | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 | 0 |
| AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation | Jun 11, 2024 | Audio GenerationVideo Generation | —Unverified | 0 | 0 |
| Bass Accompaniment Generation via Latent Diffusion | Feb 2, 2024 | Audio Generation | —Unverified | 0 | 0 |
| Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models | Mar 2, 2024 | Audio GenerationConditional Image Generation | —Unverified | 0 | 0 |
| Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation | Oct 14, 2024 | Audio Generationmultimodal generation | —Unverified | 0 | 0 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 | 0 |
| C3LLM: Conditional Multimodal Content Generation Using Large Language Models | May 25, 2024 | Audio GenerationLanguage Modelling | —Unverified | 0 | 0 |
| CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation | Jan 6, 2025 | Audio GenerationContrastive Learning | —Unverified | 0 | 0 |
| Classification Diffusion Models: Revitalizing Density Ratio Estimation | Feb 15, 2024 | Audio GenerationClassification | —Unverified | 0 | 0 |
| CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation | Nov 22, 2017 | Audio GenerationGenerative Adversarial Network | —Unverified | 0 | 0 |
| CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling | Dec 8, 2023 | Audio Generation | —Unverified | 0 | 0 |
| Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio | Jun 12, 2024 | Audio Deepfake DetectionAudio Generation | —Unverified | 0 | 0 |
| Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding | Dec 5, 2024 | Audio GenerationAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Connective Viewpoints of Signal-to-Noise Diffusion Models | Aug 8, 2024 | Audio Generation | —Unverified | 0 | 0 |
| Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation | Nov 27, 2024 | Audio Generation | —Unverified | 0 | 0 |
| CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions | Jan 28, 2025 | Audio captioningAudio Generation | —Unverified | 0 | 0 |
| CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis | Jun 14, 2021 | Audio GenerationAudio Synthesis | —Unverified | 0 | 0 |
| Creative Text-to-Audio Generation via Synthesizer Programming | Jun 1, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 | 0 |
| Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation | Nov 13, 2023 | AttributeAudio Generation | —Unverified | 0 | 0 |
| Cross-modal variational inference for bijective signal-symbol translation | Feb 10, 2020 | Audio GenerationDensity Estimation | —Unverified | 0 | 0 |
| Cyclic Learning for Binaural Audio Generation and Localization | Jan 1, 2024 | Audio GenerationObject | —Unverified | 0 | 0 |
| DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation | Mar 28, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 | 0 |
| DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos | Mar 28, 2025 | Audio GenerationLarge Language Model | —Unverified | 0 | 0 |
| Demystifying the Communication Characteristics for Distributed Transformer Models | Aug 19, 2024 | Audio GenerationGPU | —Unverified | 0 | 0 |
| Depth Infused Binaural Audio Generation using Hierarchical Cross-Modal Attention | Aug 10, 2021 | Audio GenerationDecoder | —Unverified | 0 | 0 |
| DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization | Jun 3, 2025 | Audio GenerationAudio Source Separation | —Unverified | 0 | 0 |
| Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection | Oct 4, 2024 | Anomaly DetectionAudio Generation | —Unverified | 0 | 0 |
| DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment | May 22, 2023 | AudioCapsAudio Generation | —Unverified | 0 | 0 |
| DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap | Mar 15, 2025 | AudioCapsAudio Generation | —Unverified | 0 | 0 |