Text-To-Speech Synthesis
Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.
Papers
Showing 1–10 of 332 papers
All datasetsLJSpeech20000 utterancesCMUDict 0.7bHUI speech corpusThorsten voice 21.02 neutralTrinity Speech-Gesture Dataset
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Tacotron 2 | Mean Opinion Score | 3.49 | — | Unverified |