| FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation | Jul 11, 2025 | Audio GenerationData Augmentation | —Unverified | 0 |
| Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance | Jun 26, 2025 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation | Jun 24, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 |
| ViSAGe: Video-to-Spatial Audio Generation | Jun 13, 2025 | Audio Generation | —Unverified | 0 |
| LiLAC: A Lightweight Latent ControlNet for Musical Audio Generation | Jun 13, 2025 | Audio Generation | —Unverified | 0 |
| A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations | Jun 6, 2025 | Audio GenerationText Generation | —Unverified | 0 |
| Sounding that Object: Interactive Object-Aware Image to Audio Generation | Jun 4, 2025 | Audio GenerationImage Segmentation | —Unverified | 0 |
| InfiniteAudio: Infinite-Length Audio Generation with Consistency | Jun 3, 2025 | Audio GenerationDenoising | —Unverified | 0 |
| DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization | Jun 3, 2025 | Audio GenerationAudio Source Separation | —Unverified | 0 |
| IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling | May 31, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark | May 31, 2025 | Audio GenerationFace Swapping | CodeCode Available | 0 |
| AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion | May 28, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance | May 27, 2025 | Audio GenerationDenoising | CodeCode Available | 0 |
| EnvSDD: Benchmarking Environmental Sound Deepfake Detection | May 25, 2025 | Audio Deepfake DetectionAudio Generation | —Unverified | 0 |
| MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | May 23, 2025 | Audio GenerationBenchmarking | —Unverified | 0 |
| Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio | May 19, 2025 | Audio GenerationInformation Retrieval | —Unverified | 0 |
| DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis | May 14, 2025 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining | May 12, 2025 | Audio captioningAudio Generation | —Unverified | 0 |
| Discrete Optimal Transport and Voice Conversion | May 7, 2025 | Audio GenerationVoice Conversion | —Unverified | 0 |
| Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous Gradients | May 6, 2025 | Audio GenerationDenoising | —Unverified | 0 |
| On the Design of Diffusion-based Neural Speech Codecs | Apr 11, 2025 | Audio GenerationImage Generation | —Unverified | 0 |
| Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models | Apr 6, 2025 | Audio GenerationGPU | —Unverified | 0 |
| Make Some Noise: Towards LLM audio reasoning and generation using sound tokens | Mar 28, 2025 | Audio GenerationQuantization | —Unverified | 0 |
| DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos | Mar 28, 2025 | Audio GenerationLarge Language Model | —Unverified | 0 |
| DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation | Mar 28, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 |
| DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap | Mar 15, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| AudioX: Diffusion Transformer for Anything-to-Audio Generation | Mar 13, 2025 | Audio GenerationMusic Generation | —Unverified | 0 |
| TA-V2A: Textually Assisted Video-to-Audio Generation | Mar 12, 2025 | Audio Generation | —Unverified | 0 |
| Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition | Mar 10, 2025 | Audio GenerationQuantization | —Unverified | 0 |
| ReelWave: Multi-Agentic Movie Sound Generation through Multimodal LLM Conversation | Mar 10, 2025 | Audio Generation | —Unverified | 0 |
| Speech Audio Generation from dynamic MRI via a Knowledge Enhanced Conditional Variational Autoencoder | Mar 9, 2025 | Audio GenerationDenoising | —Unverified | 0 |
| DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model | Feb 26, 2025 | Audio GenerationLarge Language Model | —Unverified | 0 |
| Towards efficient quantum algorithms for diffusion probability models | Feb 20, 2025 | Audio Generation | —Unverified | 0 |
| AudioSpa: Spatializing Sound Events with Text | Feb 16, 2025 | Audio GenerationData Augmentation | —Unverified | 0 |
| UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation | Feb 6, 2025 | Audio GenerationDiversity | —Unverified | 0 |
| AudioGenX: Explainability on Text-to-Audio Generative Models | Feb 1, 2025 | Audio Generationcounterfactual | CodeCode Available | 0 |
| CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions | Jan 28, 2025 | Audio captioningAudio Generation | —Unverified | 0 |
| Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization | Jan 22, 2025 | Audio GenerationRetrieval | CodeCode Available | 0 |
| CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation | Jan 6, 2025 | Audio GenerationContrastive Learning | —Unverified | 0 |
| Nonparametric estimation of a factorizable density using diffusion models | Jan 3, 2025 | Audio GenerationDensity Estimation | —Unverified | 0 |
| Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows | Jan 1, 2025 | Audio GenerationContrastive Learning | —Unverified | 0 |
| Animate and Sound an Image | Jan 1, 2025 | Audio Generation | —Unverified | 0 |
| Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control | Dec 29, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis | Dec 26, 2024 | Audio GenerationSpeech Synthesis | —Unverified | 0 |
| Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance | Dec 24, 2024 | Audio GenerationVideo Alignment | —Unverified | 0 |
| FolAI: Synchronized Foley Sound Generation with Semantic and Temporal Alignment | Dec 19, 2024 | Audio Generation | —Unverified | 0 |
| VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation | Dec 14, 2024 | Audio Generation | —Unverified | 0 |
| YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls | Dec 12, 2024 | Audio Generation | —Unverified | 0 |
| Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding | Dec 5, 2024 | Audio GenerationAutomatic Speech Recognition | —Unverified | 0 |
| Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation | Nov 27, 2024 | Audio Generation | —Unverified | 0 |