| Efficient Autoregressive Audio Modeling via Next-Scale Prediction | Aug 16, 2024 | Audio GenerationFAD | CodeCode Available | 2 |
| Connective Viewpoints of Signal-to-Noise Diffusion Models | Aug 8, 2024 | Audio Generation | —Unverified | 0 |
| Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation | Aug 2, 2024 | AttributeAudio Generation | CodeCode Available | 1 |
| MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Jul 30, 2024 | Audio GenerationImage to Video Generation | CodeCode Available | 1 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Stable Audio Open | Jul 19, 2024 | Audio GenerationText-to-Music Generation | CodeCode Available | 7 |
| MEDIC: Zero-shot Music Editing with Disentangled Inversion Control | Jul 18, 2024 | Audio Generation | —Unverified | 0 |
| Modeling and Driving Human Body Soundfields through Acoustic Primitives | Jul 18, 2024 | Audio GenerationNeural Rendering | —Unverified | 0 |
| LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis | Jul 15, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 1 |
| Video-to-Audio Generation with Hidden Alignment | Jul 10, 2024 | Audio GenerationData Augmentation | —Unverified | 0 |
| Read, Watch and Scream! Sound Generation from Text and Video | Jul 8, 2024 | Audio GenerationTriplet | CodeCode Available | 1 |
| SOAF: Scene Occlusion-aware Neural Acoustic Field | Jul 2, 2024 | Audio Generation | —Unverified | 0 |
| FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds | Jul 1, 2024 | Audio GenerationVideo Alignment | CodeCode Available | 4 |
| Taming Data and Transformers for Audio Generation | Jun 27, 2024 | Audio captioningAudio Generation | CodeCode Available | 2 |
| Provable Statistical Rates for Consistency Diffusion Models | Jun 23, 2024 | Audio Generation | —Unverified | 0 |
| Improving Text-To-Audio Models with Synthetic Captions | Jun 18, 2024 | AudioCapsAudio captioning | CodeCode Available | 5 |
| Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling? | Jun 13, 2024 | Audio GenerationData Augmentation | CodeCode Available | 0 |
| Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos | Jun 13, 2024 | Audio GenerationRetrieval-augmented Generation | —Unverified | 0 |
| LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation | Jun 12, 2024 | Audio Generation | CodeCode Available | 1 |
| Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio | Jun 12, 2024 | Audio Deepfake DetectionAudio Generation | —Unverified | 0 |
| AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation | Jun 11, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| Autoregressive Diffusion Transformer for Text-to-Speech Synthesis | Jun 8, 2024 | Audio GenerationDecoder | —Unverified | 0 |
| SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound | Jun 6, 2024 | Audio Generation | CodeCode Available | 2 |
| Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting | Jun 5, 2024 | Audio GenerationTime Series | CodeCode Available | 0 |
| Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer | Jun 3, 2024 | Audio GenerationIn-Context Learning | CodeCode Available | 2 |
| Creative Text-to-Audio Generation via Synthesizer Programming | Jun 1, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 |
| Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching | Jun 1, 2024 | Audio GenerationVideo-to-Sound Generation | CodeCode Available | 2 |
| A Survey of Deep Learning Audio Generation Methods | May 31, 2024 | Audio GenerationDeep Learning | —Unverified | 0 |
| SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation | May 28, 2024 | AudioCapsAudio Generation | CodeCode Available | 2 |
| C3LLM: Conditional Multimodal Content Generation Using Large Language Models | May 25, 2024 | Audio GenerationLanguage Modelling | —Unverified | 0 |
| The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation | May 23, 2024 | Audio Generation | —Unverified | 0 |
| Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation | May 23, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Prompt-guided Precise Audio Editing with Diffusion Models | May 11, 2024 | Audio Generation | —Unverified | 0 |
| The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio | May 8, 2024 | Audio Deepfake DetectionAudio Generation | CodeCode Available | 2 |
| Leveraging AI to Generate Audio for User-generated Content in Video Games | Apr 25, 2024 | Audio GenerationGame Design | —Unverified | 0 |
| Music Style Transfer With Diffusion Model | Apr 23, 2024 | Audio Generationmodel | —Unverified | 0 |
| LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search | Apr 22, 2024 | Audio GenerationDeep Learning | CodeCode Available | 0 |
| LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights | Apr 18, 2024 | Audio GenerationImage Generation | —Unverified | 0 |
| Long-form music generation with latent diffusion | Apr 16, 2024 | Audio GenerationForm | CodeCode Available | 7 |
| Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization | Apr 15, 2024 | Audio Generation | CodeCode Available | 5 |
| Synthetic training set generation using text-to-audio models for environmental sound classification | Mar 26, 2024 | Audio GenerationClassification | —Unverified | 0 |
| Text-to-Audio Generation Synchronized with Videos | Mar 8, 2024 | AudioCapsAudio Generation | —Unverified | 0 |
| RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction | Mar 8, 2024 | Audio GenerationComputational Efficiency | CodeCode Available | 2 |
| (Un)paired signal-to-signal translation with 1D conditional GANs | Mar 5, 2024 | Audio GenerationGenerative Adversarial Network | —Unverified | 0 |
| Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models | Mar 2, 2024 | Audio GenerationConditional Image Generation | —Unverified | 0 |
| Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners | Feb 27, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| LLMBind: A Unified Modality-Task Integration Framework | Feb 22, 2024 | AI AgentAudio Generation | CodeCode Available | 1 |
| Language-Codec: Bridging Discrete Codec Representations and Speech Language Models | Feb 19, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 |
| Classification Diffusion Models: Revitalizing Density Ratio Estimation | Feb 15, 2024 | Audio GenerationClassification | —Unverified | 0 |