| Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization | Jan 22, 2025 | Audio GenerationRetrieval | CodeCode Available | 0 |
| CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation | Jan 6, 2025 | Audio GenerationContrastive Learning | —Unverified | 0 |
| Nonparametric estimation of a factorizable density using diffusion models | Jan 3, 2025 | Audio GenerationDensity Estimation | —Unverified | 0 |
| RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer | Jan 2, 2025 | Audio Generationtext-to-speech | CodeCode Available | 2 |
| Animate and Sound an Image | Jan 1, 2025 | Audio Generation | —Unverified | 0 |
| Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows | Jan 1, 2025 | Audio GenerationContrastive Learning | —Unverified | 0 |
| TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Dec 30, 2024 | Audio GenerationGPU | CodeCode Available | 4 |
| Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control | Dec 29, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| ETTA: Elucidating the Design Space of Text-to-Audio Models | Dec 26, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 |
| VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis | Dec 26, 2024 | Audio GenerationSpeech Synthesis | —Unverified | 0 |
| Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance | Dec 24, 2024 | Audio GenerationVideo Alignment | —Unverified | 0 |
| RiTTA: Modeling Event Relations in Text-to-Audio Generation | Dec 20, 2024 | Audio GenerationRelation | CodeCode Available | 1 |
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | Dec 19, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 7 |
| FolAI: Synchronized Foley Sound Generation with Semantic and Temporal Alignment | Dec 19, 2024 | Audio Generation | —Unverified | 0 |
| VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation | Dec 14, 2024 | Audio Generation | —Unverified | 0 |
| YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls | Dec 12, 2024 | Audio Generation | —Unverified | 0 |
| Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding | Dec 5, 2024 | Audio GenerationAutomatic Speech Recognition | —Unverified | 0 |
| Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation | Nov 27, 2024 | Audio Generation | —Unverified | 0 |
| Video-Guided Foley Sound Generation with Multimodal Controls | Nov 26, 2024 | Audio Generation | —Unverified | 0 |
| Gotta Hear Them All: Sound Source Aware Vision to Audio Generation | Nov 23, 2024 | AllAudio Generation | CodeCode Available | 2 |
| PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation | Nov 13, 2024 | Audio GenerationDiversity | —Unverified | 0 |
| Tell What You Hear From What You See -- Video to Audio Generation Through Text | Nov 8, 2024 | Audio captioningAudio Generation | CodeCode Available | 1 |
| Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation | Nov 7, 2024 | Audio GenerationLarge Language Model | —Unverified | 0 |
| The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge | Oct 31, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier | Oct 28, 2024 | Audio Deepfake DetectionAudio Generation | CodeCode Available | 2 |
| Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation | Oct 23, 2024 | Audio Generation | CodeCode Available | 0 |
| SNAC: Multi-Scale Neural Audio Codec | Oct 18, 2024 | Audio CompressionAudio Generation | CodeCode Available | 4 |
| Movie Gen: A Cast of Media Foundation Models | Oct 17, 2024 | Audio GenerationVideo Editing | CodeCode Available | 3 |
| FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | Oct 16, 2024 | Audio GenerationGPU | CodeCode Available | 5 |
| Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation | Oct 14, 2024 | Audio Generationmultimodal generation | —Unverified | 0 |
| Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection | Oct 4, 2024 | Anomaly DetectionAudio Generation | —Unverified | 0 |
| Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition | Oct 4, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation | Oct 3, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models | Sep 28, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation | Sep 27, 2024 | Audio ClassificationAudio Generation | CodeCode Available | 1 |
| ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech | Sep 24, 2024 | Audio Generation | CodeCode Available | 3 |
| Video-to-Audio Generation with Fine-grained Temporal Semantics | Sep 23, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| Temporally Aligned Audio for Video with Autoregression | Sep 20, 2024 | Audio GenerationVideo-to-Sound Generation | CodeCode Available | 1 |
| PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models | Sep 20, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization | Sep 19, 2024 | Audio CompressionAudio Generation | —Unverified | 0 |
| AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions | Sep 19, 2024 | Audio Generation | —Unverified | 0 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 |
| Learning Source Disentanglement in Neural Audio Codec | Sep 17, 2024 | Audio CompressionAudio Generation | —Unverified | 0 |
| Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation | Sep 14, 2024 | Audio GenerationStyle Transfer | —Unverified | 0 |
| TSELM: Target Speaker Extraction using Discrete Tokens and Language Models | Sep 12, 2024 | Audio GenerationTarget Speaker Extraction | CodeCode Available | 2 |
| MetaBGM: Dynamic Soundtrack Transformation For Continuous Multi-Scene Experiences With Ambient Awareness And Personalization | Sep 5, 2024 | Audio Generation | —Unverified | 0 |
| Applications and Advances of Artificial Intelligence in Music Generation:A Review | Sep 3, 2024 | Audio GenerationMusic Generation | —Unverified | 0 |
| Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model | Aug 30, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 |
| Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound | Aug 21, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Demystifying the Communication Characteristics for Distributed Transformer Models | Aug 19, 2024 | Audio GenerationGPU | —Unverified | 0 |