| Masked Audio Generation using a Single Non-Autoregressive Transformer | Jan 9, 2024 | Audio Generation | —Unverified | 0 |
| MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation | Oct 3, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| MEDIC: Zero-shot Music Editing with Disentangled Inversion Control | Jul 18, 2024 | Audio Generation | —Unverified | 0 |
| MetaBGM: Dynamic Soundtrack Transformation For Continuous Multi-Scene Experiences With Ambient Awareness And Personalization | Sep 5, 2024 | Audio Generation | —Unverified | 0 |
| MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | May 23, 2025 | Audio GenerationBenchmarking | —Unverified | 0 |
| Modeling and Driving Human Body Soundfields through Acoustic Primitives | Jul 18, 2024 | Audio GenerationNeural Rendering | —Unverified | 0 |
| Music Style Transfer With Diffusion Model | Apr 23, 2024 | Audio Generationmodel | —Unverified | 0 |
| NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization | Sep 19, 2024 | Audio CompressionAudio Generation | —Unverified | 0 |
| Neural Granular Sound Synthesis | Aug 4, 2020 | Audio Generation | —Unverified | 0 |
| Nonparametric estimation of a factorizable density using diffusion models | Jan 3, 2025 | Audio GenerationDensity Estimation | —Unverified | 0 |
| NU-GAN: High resolution neural upsampling with GAN | Oct 22, 2020 | Audio GenerationSpeech Synthesis | —Unverified | 0 |
| On Target Representation in Continuous-output Neural Machine Translation | May 1, 2022 | Audio GenerationMachine Translation | —Unverified | 0 |
| On the Design of Diffusion-based Neural Speech Codecs | Apr 11, 2025 | Audio GenerationImage Generation | —Unverified | 0 |
| On The Open Prompt Challenge In Conditional Audio Generation | Nov 1, 2023 | Audio Generation | —Unverified | 0 |
| PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation | Nov 13, 2024 | Audio GenerationDiversity | —Unverified | 0 |
| TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining | May 12, 2025 | Audio captioningAudio Generation | —Unverified | 0 |
| TA-V2A: Textually Assisted Video-to-Audio Generation | Mar 12, 2025 | Audio Generation | —Unverified | 0 |
| Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation | Sep 14, 2024 | Audio GenerationStyle Transfer | —Unverified | 0 |
| Text-to-Audio Generation Synchronized with Videos | Mar 8, 2024 | AudioCapsAudio Generation | —Unverified | 0 |
| The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge | Oct 31, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation | May 23, 2024 | Audio Generation | —Unverified | 0 |
| tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models | Nov 24, 2023 | Audio GenerationEvent Detection | —Unverified | 0 |
| Towards efficient quantum algorithms for diffusion probability models | Feb 20, 2025 | Audio Generation | —Unverified | 0 |
| Transferring neural speech waveform synthesizers to musical instrument sounds generation | Oct 27, 2019 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control | Dec 29, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio | May 19, 2025 | Audio GenerationInformation Retrieval | —Unverified | 0 |
| UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation | Feb 6, 2025 | Audio GenerationDiversity | —Unverified | 0 |
| (Un)paired signal-to-signal translation with 1D conditional GANs | Mar 5, 2024 | Audio GenerationGenerative Adversarial Network | —Unverified | 0 |
| Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound | Aug 21, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Video-Guided Foley Sound Generation with Multimodal Controls | Nov 26, 2024 | Audio Generation | —Unverified | 0 |
| Video-to-Audio Generation with Fine-grained Temporal Semantics | Sep 23, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| Video-to-Audio Generation with Hidden Alignment | Jul 10, 2024 | Audio GenerationData Augmentation | —Unverified | 0 |
| VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation | Dec 14, 2024 | Audio Generation | —Unverified | 0 |
| ViSAGe: Video-to-Spatial Audio Generation | Jun 13, 2025 | Audio Generation | —Unverified | 0 |
| Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation | May 23, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Visually Informed Binaural Audio Generation without Binaural Audios | Apr 13, 2021 | Audio Generation | —Unverified | 0 |
| Voice command generation using Progressive Wavegans | Mar 13, 2019 | Audio GenerationImage Generation | —Unverified | 0 |
| VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis | Dec 26, 2024 | Audio GenerationSpeech Synthesis | —Unverified | 0 |
| Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients | May 6, 2025 | Audio GenerationDenoising | —Unverified | 0 |
| FolAI: Synchronized Foley Sound Generation with Semantic and Temporal Alignment | Dec 19, 2024 | Audio Generation | —Unverified | 0 |
| Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance | Jun 26, 2025 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Streamable Neural Audio Synthesis With Non-Causal Convolutions | Apr 14, 2022 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning | Oct 16, 2022 | Audio GenerationRepresentation Learning | —Unverified | 0 |
| Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition | Mar 10, 2025 | Audio GenerationQuantization | —Unverified | 0 |
| Synthetic training set generation using text-to-audio models for environmental sound classification | Mar 26, 2024 | Audio GenerationClassification | —Unverified | 0 |
| Audio Deepfake Attribution: An Initial Dataset and Investigation | Aug 21, 2022 | Audio GenerationBinary Classification | —Unverified | 0 |
| Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation | Mar 29, 2023 | Audio GenerationContrastive Learning | CodeCode Available | 0 |
| Adversarial Generation of Time-Frequency Features with application in audio synthesis | Feb 11, 2019 | Audio GenerationAudio Synthesis | CodeCode Available | 0 |
| XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark | May 31, 2025 | Audio GenerationFace Swapping | CodeCode Available | 0 |
| Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting | Jun 5, 2024 | Audio GenerationTime Series | CodeCode Available | 0 |