SOTAVerified

Audio Generation

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Papers

Showing 201250 of 270 papers

TitleStatusHype
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsCode2
ArchiSound: Audio Generation with DiffusionCode4
SingSong: Generating musical accompaniments from singing0
AudioLDM: Text-to-Audio Generation with Latent Diffusion ModelsCode4
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning0
AudioGen: Textually Guided Audio GenerationCode6
AudioLM: a Language Modeling Approach to Audio GenerationCode7
Audio Deepfake Attribution: An Initial Dataset and Investigation0
Diffsound: Discrete Diffusion Model for Text-to-sound GenerationCode2
Adversarial Audio Synthesis with Complex-valued Polynomial Networks0
BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingCode3
FlexLip: A Controllable Text-to-Lip System0
Symphony Generation with Permutation Invariant Language ModelCode2
On Target Representation in Continuous-output Neural Machine Translation0
Differentiable Time-Frequency Scattering on GPUCode1
Streamable Neural Audio Synthesis With Non-Causal Convolutions0
HiFi++: a Unified Framework for Bandwidth Extension and Speech EnhancementCode1
It's Raw! Audio Generation with State-Space ModelsCode1
ADD 2022: the First Audio Deep Synthesis Detection Challenge0
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale CorpusCode1
Soundify: Matching Sound Effects to Video0
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video0
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity ResponsesCode1
Unsupervised Source Separation By Steering Pretrained Music ModelsCode1
Taming Visually Guided Sound GenerationCode1
An investigation of pre-upsampling generative modelling and Generative Adversarial Networks in audio super resolution0
Depth Infused Binaural Audio Generation using Hierarchical Cross-Modal Attention0
Neural Waveshaping SynthesisCode1
CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis0
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive PriorCode0
Catch-A-Waveform: Learning to Generate Audio from a Single Short ExampleCode1
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation0
Visually Informed Binaural Audio Generation without Binaural Audios0
Anytime Sampling for Autoregressive Models via Ordered AutoencodingCode1
Localize to Binauralize: Audio Spatialization From Visual Sound Source LocalizationCode1
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial TrainingCode1
A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions0
NU-GAN: High resolution neural upsampling with GAN0
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder0
Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning0
Neural Granular Sound Synthesis0
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation0
Audeo: Audio Generation for a Silent Performance VideoCode1
Perceiving Music Quality with GANsCode1
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder0
Unconditional Audio Generation with Generative Adversarial Networks and Cycle RegularizationCode1
GACELA -- A generative adversarial context encoder for long audio inpaintingCode1
Guided Generative Adversarial Neural Network for Representation Learning and High Fidelity Audio Generation using Fewer Labelled Audio Data0
Cross-modal variational inference for bijective signal-symbol translation0
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA0
Show:102550
← PrevPage 5 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AudioGenFD_openl3185.53Unverified
2AudioLDM2-largeFD_openl3158.04Unverified
3Stable Audio 2.0FD_openl3110.62Unverified
4Stable AudioFD_openl3103.66Unverified
5ETTAFD_openl380.13Unverified
6TangoFlux-baseFD_openl379.7Unverified
7Stable Audio OpenFD_openl378.24Unverified
8TangoFluxFD_openl375.1Unverified
9ETTA-FT-AC-100kFD_openl361.79Unverified
10DiffsoundFAD7.75Unverified
#ModelMetricClaimedVerifiedStatus
1VAB-Encodec (Ours)Bits per byte40Unverified
2Sparse Transformer 152M (strided)Bits per byte1.97Unverified
#ModelMetricClaimedVerifiedStatus
1SymphonyNet Human listening average results3.5Unverified