| MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling | Jul 11, 2025 | Audio SynthesisLanguage Modelling | —Unverified | 0 |
| Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance | Jun 26, 2025 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Diffusion-Based Symbolic Regression | May 30, 2025 | Audio SynthesisDenoising | —Unverified | 0 |
| SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet | May 22, 2025 | Audio Synthesis | —Unverified | 0 |
| Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism | May 20, 2025 | Audio SynthesisDenoising | —Unverified | 0 |
| DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis | May 14, 2025 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates | May 9, 2025 | Audio SynthesisCPU | CodeCode Available | 1 |
| Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis | Apr 18, 2025 | Audio Synthesis | —Unverified | 0 |
| TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis | Apr 8, 2025 | Audio SynthesisFAD | —Unverified | 0 |
| Designing Neural Synthesizers for Low-Latency Interaction | Mar 14, 2025 | Audio Synthesis | —Unverified | 0 |
| Long-Video Audio Synthesis with Multi-Agent Collaboration | Mar 13, 2025 | Audio SynthesisScene Segmentation | —Unverified | 0 |
| Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision | Feb 26, 2025 | Audio SynthesisAutomatic Speech Recognition | —Unverified | 0 |
| XAttnMark: Learning Robust Audio Watermarking with Cross-Attention | Feb 6, 2025 | Audio SynthesisFace Swapping | —Unverified | 0 |
| Generative diffusion model with inverse renormalization group flows | Jan 15, 2025 | Audio SynthesisDenoising | CodeCode Available | 1 |
| Customized Condition Controllable Generation for Video Soundtrack | Jan 1, 2025 | Audio Synthesis | —Unverified | 0 |
| Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control | Dec 29, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | Dec 19, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 7 |
| CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder | Dec 12, 2024 | Audio SynthesisSinging Voice Synthesis | —Unverified | 0 |
| Zero-Shot Mono-to-Binaural Speech Synthesis | Dec 11, 2024 | Audio SynthesisDenoising | —Unverified | 0 |
| Generalized Diffusion Model with Adjusted Offset Noise | Dec 4, 2024 | Audio SynthesisDrug Discovery | —Unverified | 0 |
| OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows | Dec 2, 2024 | Audio SynthesisImage Generation | CodeCode Available | 2 |
| VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space | Nov 22, 2024 | Audio SynthesisDecoder | —Unverified | 0 |
| Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement | Oct 22, 2024 | Audio SynthesisDiversity | —Unverified | 0 |
| Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals | Oct 8, 2024 | Audio Synthesis | —Unverified | 0 |
| Where are we in audio deepfake detection? A systematic analysis over generative and detection models | Oct 6, 2024 | Audio Deepfake DetectionAudio Synthesis | CodeCode Available | 1 |
| PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models | Sep 20, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning | Sep 19, 2024 | Audio Synthesis | CodeCode Available | 1 |
| D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack | Sep 11, 2024 | Adversarial AttackAudio Synthesis | —Unverified | 0 |
| Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis | Sep 10, 2024 | Audio SynthesisAudio-Visual Synchronization | —Unverified | 0 |
| Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP | Sep 4, 2024 | Audio SynthesisComputational Efficiency | —Unverified | 0 |
| Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music | Aug 22, 2024 | Audio Synthesis | —Unverified | 0 |
| Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound | Aug 21, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos | Jul 30, 2024 | Audio SynthesisVideo Summarization | —Unverified | 0 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis | Jul 15, 2024 | Audio SynthesisDecoder | —Unverified | 0 |
| LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis | Jul 15, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 1 |
| Taming Data and Transformers for Audio Generation | Jun 27, 2024 | Audio captioningAudio Generation | CodeCode Available | 2 |
| AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis | Jun 13, 2024 | Audio SynthesisNeRF | —Unverified | 0 |
| CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems | Jun 11, 2024 | Audio SynthesisFace Swapping | —Unverified | 0 |
| Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis | Jun 7, 2024 | Audio Synthesis | CodeCode Available | 2 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 |
| Creative Text-to-Audio Generation via Synthesizer Programming | Jun 1, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting | May 30, 2024 | Audio SynthesisRepresentation Learning | —Unverified | 0 |
| Differentiable All-pole Filters for Time-varying Audio Systems | Apr 11, 2024 | AllAudio Effects Modeling | CodeCode Available | 2 |
| Diffusion-TS: Interpretable Diffusion for General Time Series Generation | Mar 4, 2024 | Audio SynthesisDecoder | CodeCode Available | 3 |
| Text2Data: Low-Resource Data Generation with Textual Control | Feb 8, 2024 | Audio SynthesisTime Series | —Unverified | 0 |
| DiffMoog: a Differentiable Modular Synthesizer for Sound Matching | Jan 23, 2024 | Audio Synthesis | CodeCode Available | 2 |
| T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis | Jan 17, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 1 |
| FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models | Dec 13, 2023 | 3D Face AnimationAudio Synthesis | CodeCode Available | 2 |
| Fast Diffusion GAN Model for Symbolic Music Generation Controlled by Emotions | Oct 21, 2023 | Audio SynthesisGenerative Adversarial Network | —Unverified | 0 |