| Stable Audio Open | Jul 19, 2024 | Audio GenerationText-to-Music Generation | CodeCode Available | 7 | 5 |
| Long-form music generation with latent diffusion | Apr 16, 2024 | Audio GenerationForm | CodeCode Available | 7 | 5 |
| AudioLM: a Language Modeling Approach to Audio Generation | Sep 7, 2022 | Audio Generation | CodeCode Available | 7 | 5 |
| Fast Timing-Conditioned Latent Audio Diffusion | Feb 7, 2024 | Audio GenerationGPU | CodeCode Available | 7 | 5 |
| Fast Text-to-Audio Generation with Adversarial Post-Training | May 13, 2025 | ARCAudio Generation | CodeCode Available | 7 | 5 |
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | Dec 19, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 7 | 5 |
| SoundStorm: Efficient Parallel Audio Generation | May 16, 2023 | Audio Generation | CodeCode Available | 6 | 5 |
| AudioGen: Textually Guided Audio Generation | Sep 30, 2022 | Audio GenerationDescriptive | CodeCode Available | 6 | 5 |
| InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation | Feb 28, 2025 | Audio GenerationForm | CodeCode Available | 5 | 5 |
| FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | Oct 16, 2024 | Audio GenerationGPU | CodeCode Available | 5 | 5 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 | 5 |
| Improving Text-To-Audio Models with Synthetic Captions | Jun 18, 2024 | AudioCapsAudio captioning | CodeCode Available | 5 | 5 |
| Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization | Apr 15, 2024 | Audio Generation | CodeCode Available | 5 | 5 |
| ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing | Jun 26, 2025 | Audio GenerationLarge Language Model | CodeCode Available | 5 | 5 |
| TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Dec 30, 2024 | Audio GenerationGPU | CodeCode Available | 4 | 5 |
| ArchiSound: Audio Generation with Diffusion | Jan 30, 2023 | Audio GenerationGPU | CodeCode Available | 4 | 5 |
| SNAC: Multi-Scale Neural Audio Codec | Oct 18, 2024 | Audio CompressionAudio Generation | CodeCode Available | 4 | 5 |
| AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining | Aug 10, 2023 | Audio GenerationIn-Context Learning | CodeCode Available | 4 | 5 |
| Latent Swap Joint Diffusion for 2D Long-Form Latent Generation | Feb 7, 2025 | Audio GenerationDenoising | CodeCode Available | 4 | 5 |
| FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds | Jul 1, 2024 | Audio GenerationVideo Alignment | CodeCode Available | 4 | 5 |
| Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert | Apr 18, 2023 | Audio GenerationExpressive Speech Synthesis | CodeCode Available | 4 | 5 |
| AudioLDM: Text-to-Audio Generation with Latent Diffusion Models | Jan 29, 2023 | AudioCapsAudio Generation | CodeCode Available | 4 | 5 |
| High-Fidelity Audio Compression with Improved RVQGAN | Jun 11, 2023 | Audio CompressionAudio Generation | CodeCode Available | 3 | 5 |
| ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech | Sep 24, 2024 | Audio Generation | CodeCode Available | 3 | 5 |
| Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | Apr 24, 2023 | AudioCapsAudio Generation | CodeCode Available | 3 | 5 |
| Movie Gen: A Cast of Media Foundation Models | Oct 17, 2024 | Audio GenerationVideo Editing | CodeCode Available | 3 | 5 |
| BigVGAN: A Universal Neural Vocoder with Large-Scale Training | Jun 9, 2022 | Audio GenerationAudio Synthesis | CodeCode Available | 3 | 5 |
| Language-Codec: Bridging Discrete Codec Representations and Speech Language Models | Feb 19, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 | 5 |
| OmniAudio: Generating Spatial Audio from 360-Degree Video | Apr 21, 2025 | Audio Generation | CodeCode Available | 3 | 5 |
| Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model | Aug 30, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 | 5 |
| ETTA: Elucidating the Design Space of Text-to-Audio Models | Dec 26, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 | 5 |
| EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks | Jan 31, 2024 | Audio GenerationSpeech Synthesis | CodeCode Available | 2 | 5 |
| Symphony Generation with Permutation Invariant Language Model | May 10, 2022 | Audio GenerationDecoder | CodeCode Available | 2 | 5 |
| SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation | May 28, 2024 | AudioCapsAudio Generation | CodeCode Available | 2 | 5 |
| Taming Data and Transformers for Audio Generation | Jun 27, 2024 | Audio captioningAudio Generation | CodeCode Available | 2 | 5 |
| DDSP: Differentiable Digital Signal Processing | Jan 14, 2020 | Audio GenerationAudio Synthesis | CodeCode Available | 2 | 5 |
| Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer | Jun 3, 2024 | Audio GenerationIn-Context Learning | CodeCode Available | 2 | 5 |
| RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction | Mar 8, 2024 | Audio GenerationComputational Efficiency | CodeCode Available | 2 | 5 |
| Gotta Hear Them All: Sound Source Aware Vision to Audio Generation | Nov 23, 2024 | AllAudio Generation | CodeCode Available | 2 | 5 |
| Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models | Jan 30, 2023 | Audio GenerationText-to-Video Generation | CodeCode Available | 2 | 5 |
| Diffsound: Discrete Diffusion Model for Text-to-sound Generation | Jul 20, 2022 | Audio GenerationDecoder | CodeCode Available | 2 | 5 |
| RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer | Jan 2, 2025 | Audio Generationtext-to-speech | CodeCode Available | 2 | 5 |
| Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier | Oct 28, 2024 | Audio Deepfake DetectionAudio Generation | CodeCode Available | 2 | 5 |
| PodAgent: A Comprehensive Framework for Podcast Generation | Mar 1, 2025 | Audio GenerationSpeech Synthesis | CodeCode Available | 2 | 5 |
| Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation | Jan 2, 2024 | Audio Generationcross-modal alignment | CodeCode Available | 2 | 5 |
| Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching | Jun 1, 2024 | Audio GenerationVideo-to-Sound Generation | CodeCode Available | 2 | 5 |
| Baichuan-Omni-1.5 Technical Report | Jan 26, 2025 | Audio Generation | CodeCode Available | 2 | 5 |
| Efficient Autoregressive Audio Modeling via Next-Scale Prediction | Aug 16, 2024 | Audio GenerationFAD | CodeCode Available | 2 | 5 |
| KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation | Feb 21, 2025 | Audio GenerationFAD | CodeCode Available | 2 | 5 |
| SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound | Jun 6, 2024 | Audio Generation | CodeCode Available | 2 | 5 |