SOTAVerified

Audio Generation

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Papers

Showing 201225 of 270 papers

TitleStatusHype
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsCode2
ArchiSound: Audio Generation with DiffusionCode4
SingSong: Generating musical accompaniments from singing0
AudioLDM: Text-to-Audio Generation with Latent Diffusion ModelsCode4
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning0
AudioGen: Textually Guided Audio GenerationCode6
AudioLM: a Language Modeling Approach to Audio GenerationCode7
Audio Deepfake Attribution: An Initial Dataset and Investigation0
Diffsound: Discrete Diffusion Model for Text-to-sound GenerationCode2
Adversarial Audio Synthesis with Complex-valued Polynomial Networks0
BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingCode3
FlexLip: A Controllable Text-to-Lip System0
Symphony Generation with Permutation Invariant Language ModelCode2
On Target Representation in Continuous-output Neural Machine Translation0
Differentiable Time-Frequency Scattering on GPUCode1
Streamable Neural Audio Synthesis With Non-Causal Convolutions0
HiFi++: a Unified Framework for Bandwidth Extension and Speech EnhancementCode1
It's Raw! Audio Generation with State-Space ModelsCode1
ADD 2022: the First Audio Deep Synthesis Detection Challenge0
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale CorpusCode1
Soundify: Matching Sound Effects to Video0
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video0
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity ResponsesCode1
Unsupervised Source Separation By Steering Pretrained Music ModelsCode1
Taming Visually Guided Sound GenerationCode1
Show:102550
← PrevPage 9 of 11Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AudioGenFD_openl3185.53Unverified
2AudioLDM2-largeFD_openl3158.04Unverified
3Stable Audio 2.0FD_openl3110.62Unverified
4Stable AudioFD_openl3103.66Unverified
5ETTAFD_openl380.13Unverified
6TangoFlux-baseFD_openl379.7Unverified
7Stable Audio OpenFD_openl378.24Unverified
8TangoFluxFD_openl375.1Unverified
9ETTA-FT-AC-100kFD_openl361.79Unverified
10DiffsoundFAD7.75Unverified
#ModelMetricClaimedVerifiedStatus
1VAB-Encodec (Ours)Bits per byte40Unverified
2Sparse Transformer 152M (strided)Bits per byte1.97Unverified
#ModelMetricClaimedVerifiedStatus
1SymphonyNet Human listening average results3.5Unverified