Speech Synthesis
Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.
Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.
( Image credit: WaveNet: A generative model for raw audio )
Papers
Showing 1–10 of 1249 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PeriodWave-Turbo-L | PESQ | 4.45 | — | Unverified |
| 2 | BigVGAN-v2 | PESQ | 4.36 | — | Unverified |
| 3 | EVA-GAN-big | PESQ | 4.35 | — | Unverified |
| 4 | PeriodWave + FreeU | PESQ | 4.25 | — | Unverified |
| 5 | RFWave | PESQ | 4.23 | — | Unverified |
| 6 | BigVSAN (w/ snakebeta) | PESQ | 4.12 | — | Unverified |
| 7 | BigVSAN | PESQ | 4.12 | — | Unverified |
| 8 | EVA-GAN-base | PESQ | 4.03 | — | Unverified |
| 9 | BigVGAN | PESQ | 4.03 | — | Unverified |
| 10 | Vocos | PESQ | 3.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Tacotron 2 | Mean Opinion Score | 4.53 | — | Unverified |
| 2 | WaveNet (Linguistic) | Mean Opinion Score | 4.34 | — | Unverified |
| 3 | WaveNet (L+F) | Mean Opinion Score | 4.21 | — | Unverified |
| 4 | Tacotron | Mean Opinion Score | 4 | — | Unverified |
| 5 | HMM-driven concatenative | Mean Opinion Score | 3.86 | — | Unverified |
| 6 | LSTM-RNN parametric | Mean Opinion Score | 3.67 | — | Unverified |
| 7 | means | Mean Opinion Score | 0 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BDDM vocoder | Mean Opinion Score | 4.48 | — | Unverified |
| 2 | DiffWave LARGE | Mean Opinion Score | 4.44 | — | Unverified |
| 3 | Neural HMM | Mean Opinion Score | 3.24 | — | Unverified |
| 4 | Neural HMM Ablation with 1 state per phone | Mean Opinion Score | 2.68 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | WaveNet (L+F) | Mean Opinion Score | 4.08 | — | Unverified |
| 2 | LSTM-RNN parametric | Mean Opinion Score | 3.79 | — | Unverified |
| 3 | HMM-driven concatenative | Mean Opinion Score | 3.47 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SampleRNN (2-tier) | NLL | 1.39 | — | Unverified |
| 2 | SampleRNN (3-tier) | NLL | 1.39 | — | Unverified |