Speech-to-Speech Translation
Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging.
Papers
Showing 1–10 of 117 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Hokkien→En (Two-pass decoding) | ASR-BLEU (Dev) | 13.6 | — | Unverified |
| 2 | Hokkien→En (Two-stage) | ASR-BLEU (Dev) | 12.5 | — | Unverified |
| 3 | Hokkien→En (Three-stage) | ASR-BLEU (Dev) | 12.5 | — | Unverified |
| 4 | Hokkien→En (Single-pass decoding) | ASR-BLEU (Dev) | 8.8 | — | Unverified |
| 5 | En→Hokkien (Two-pass decoding) | ASR-BLEU (Dev) | 7.8 | — | Unverified |
| 6 | En→Hokkien (Three-stage) | ASR-BLEU (Dev) | 7.5 | — | Unverified |
| 7 | En→Hokkien (Two-stage) | ASR-BLEU (Dev) | 7.1 | — | Unverified |
| 8 | En→Hokkien (Single-pass decoding) | ASR-BLEU (Dev) | 6.6 | — | Unverified |