| Video-Guided Foley Sound Generation with Multimodal Controls | Nov 26, 2024 | Audio Generation | —Unverified | 0 |
| PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation | Nov 13, 2024 | Audio GenerationDiversity | —Unverified | 0 |
| Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation | Nov 7, 2024 | Audio GenerationLarge Language Model | —Unverified | 0 |
| The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge | Oct 31, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation | Oct 23, 2024 | Audio Generation | CodeCode Available | 0 |
| Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation | Oct 14, 2024 | Audio Generationmultimodal generation | —Unverified | 0 |
| Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection | Oct 4, 2024 | Anomaly DetectionAudio Generation | —Unverified | 0 |
| Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition | Oct 4, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation | Oct 3, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models | Sep 28, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Video-to-Audio Generation with Fine-grained Temporal Semantics | Sep 23, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models | Sep 20, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions | Sep 19, 2024 | Audio Generation | —Unverified | 0 |
| NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization | Sep 19, 2024 | Audio CompressionAudio Generation | —Unverified | 0 |
| Learning Source Disentanglement in Neural Audio Codec | Sep 17, 2024 | Audio CompressionAudio Generation | —Unverified | 0 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 |
| Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation | Sep 14, 2024 | Audio GenerationStyle Transfer | —Unverified | 0 |
| MetaBGM: Dynamic Soundtrack Transformation For Continuous Multi-Scene Experiences With Ambient Awareness And Personalization | Sep 5, 2024 | Audio Generation | —Unverified | 0 |
| Applications and Advances of Artificial Intelligence in Music Generation:A Review | Sep 3, 2024 | Audio GenerationMusic Generation | —Unverified | 0 |
| Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound | Aug 21, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Demystifying the Communication Characteristics for Distributed Transformer Models | Aug 19, 2024 | Audio GenerationGPU | —Unverified | 0 |
| Connective Viewpoints of Signal-to-Noise Diffusion Models | Aug 8, 2024 | Audio Generation | —Unverified | 0 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Modeling and Driving Human Body Soundfields through Acoustic Primitives | Jul 18, 2024 | Audio GenerationNeural Rendering | —Unverified | 0 |
| MEDIC: Zero-shot Music Editing with Disentangled Inversion Control | Jul 18, 2024 | Audio Generation | —Unverified | 0 |
| Video-to-Audio Generation with Hidden Alignment | Jul 10, 2024 | Audio GenerationData Augmentation | —Unverified | 0 |
| SOAF: Scene Occlusion-aware Neural Acoustic Field | Jul 2, 2024 | Audio Generation | —Unverified | 0 |
| Provable Statistical Rates for Consistency Diffusion Models | Jun 23, 2024 | Audio Generation | —Unverified | 0 |
| Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling? | Jun 13, 2024 | Audio GenerationData Augmentation | CodeCode Available | 0 |
| Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos | Jun 13, 2024 | Audio GenerationRetrieval-augmented Generation | —Unverified | 0 |
| Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio | Jun 12, 2024 | Audio Deepfake DetectionAudio Generation | —Unverified | 0 |
| AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation | Jun 11, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 |
| Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting | Jun 5, 2024 | Audio GenerationTime Series | CodeCode Available | 0 |
| Creative Text-to-Audio Generation via Synthesizer Programming | Jun 1, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| A Survey of Deep Learning Audio Generation Methods | May 31, 2024 | Audio GenerationDeep Learning | —Unverified | 0 |
| C3LLM: Conditional Multimodal Content Generation Using Large Language Models | May 25, 2024 | Audio GenerationLanguage Modelling | —Unverified | 0 |
| Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation | May 23, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation | May 23, 2024 | Audio Generation | —Unverified | 0 |
| Prompt-guided Precise Audio Editing with Diffusion Models | May 11, 2024 | Audio Generation | —Unverified | 0 |
| Leveraging AI to Generate Audio for User-generated Content in Video Games | Apr 25, 2024 | Audio GenerationGame Design | —Unverified | 0 |
| Music Style Transfer With Diffusion Model | Apr 23, 2024 | Audio Generationmodel | —Unverified | 0 |
| LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search | Apr 22, 2024 | Audio GenerationDeep Learning | CodeCode Available | 0 |
| LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights | Apr 18, 2024 | Audio GenerationImage Generation | —Unverified | 0 |
| Synthetic training set generation using text-to-audio models for environmental sound classification | Mar 26, 2024 | Audio GenerationClassification | —Unverified | 0 |
| Text-to-Audio Generation Synchronized with Videos | Mar 8, 2024 | AudioCapsAudio Generation | —Unverified | 0 |
| (Un)paired signal-to-signal translation with 1D conditional GANs | Mar 5, 2024 | Audio GenerationGenerative Adversarial Network | —Unverified | 0 |
| Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models | Mar 2, 2024 | Audio GenerationConditional Image Generation | —Unverified | 0 |
| Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners | Feb 27, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Classification Diffusion Models: Revitalizing Density Ratio Estimation | Feb 15, 2024 | Audio GenerationClassification | —Unverified | 0 |