SOTAVerified

Audio Generation

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Papers

Showing 7180 of 270 papers

TitleStatusHype
Anytime Sampling for Autoregressive Models via Ordered AutoencodingCode1
LLMBind: A Unified Modality-Task Integration FrameworkCode1
LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance MusicCode1
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup StrategiesCode1
Speech collage: code-switched audio generation by collaging monolingual corporaCode1
It's Raw! Audio Generation with State-Space ModelsCode1
LAFMA: A Latent Flow Matching Model for Text-to-Audio GenerationCode1
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity ResponsesCode1
An Efficient Membership Inference Attack for the Diffusion Model by Proximal InitializationCode1
Invisible Watermarking for Audio Generation Diffusion ModelsCode1
Show:102550
← PrevPage 8 of 27Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AudioGenFD_openl3185.53Unverified
2AudioLDM2-largeFD_openl3158.04Unverified
3Stable Audio 2.0FD_openl3110.62Unverified
4Stable AudioFD_openl3103.66Unverified
5ETTAFD_openl380.13Unverified
6TangoFlux-baseFD_openl379.7Unverified
7Stable Audio OpenFD_openl378.24Unverified
8TangoFluxFD_openl375.1Unverified
9ETTA-FT-AC-100kFD_openl361.79Unverified
10DiffsoundFAD7.75Unverified
#ModelMetricClaimedVerifiedStatus
1VAB-Encodec (Ours)Bits per byte40Unverified
2Sparse Transformer 152M (strided)Bits per byte1.97Unverified
#ModelMetricClaimedVerifiedStatus
1SymphonyNet Human listening average results3.5Unverified