SOTAVerified

Audio Generation

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Papers

Showing 226250 of 270 papers

TitleStatusHype
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation0
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights0
Learning Source Disentanglement in Neural Audio Codec0
Leveraging AI to Generate Audio for User-generated Content in Video Games0
Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study0
LiLAC: A Lightweight Latent ControlNet for Musical Audio Generation0
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models0
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens0
Masked Audio Generation using a Single Non-Autoregressive Transformer0
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation0
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control0
MetaBGM: Dynamic Soundtrack Transformation For Continuous Multi-Scene Experiences With Ambient Awareness And Personalization0
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation0
Modeling and Driving Human Body Soundfields through Acoustic Primitives0
Music Source Separation in the Waveform Domain0
Music Style Transfer With Diffusion Model0
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization0
Neural Granular Sound Synthesis0
Nonparametric estimation of a factorizable density using diffusion models0
NU-GAN: High resolution neural upsampling with GAN0
On Target Representation in Continuous-output Neural Machine Translation0
On the Design of Diffusion-based Neural Speech Codecs0
On The Open Prompt Challenge In Conditional Audio Generation0
PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation0
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior0
Show:102550
← PrevPage 10 of 11Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AudioGenFD_openl3185.53Unverified
2AudioLDM2-largeFD_openl3158.04Unverified
3Stable Audio 2.0FD_openl3110.62Unverified
4Stable AudioFD_openl3103.66Unverified
5ETTAFD_openl380.13Unverified
6TangoFlux-baseFD_openl379.7Unverified
7Stable Audio OpenFD_openl378.24Unverified
8TangoFluxFD_openl375.1Unverified
9ETTA-FT-AC-100kFD_openl361.79Unverified
10DiffsoundFAD7.75Unverified
#ModelMetricClaimedVerifiedStatus
1VAB-Encodec (Ours)Bits per byte40Unverified
2Sparse Transformer 152M (strided)Bits per byte1.97Unverified
#ModelMetricClaimedVerifiedStatus
1SymphonyNet Human listening average results3.5Unverified