SOTAVerified

Speech-to-Speech Translation

Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging.

Papers

Showing 76100 of 117 papers

TitleStatusHype
Translatotron 3: Speech to Speech Translation with Monolingual Data0
UWSpeech: Speech to Speech Translation for Unwritten Languages0
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs0
What does it take to get state of the art in simultaneous speech-to-speech translation?0
A Case Study on Filtering for End-to-End Speech Translation0
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation0
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation0
Assessing Evaluation Metrics for Speech-to-Speech Translation0
AudioPaLM: A Large Language Model That Can Speak and Listen0
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation0
Automatic Extraction of Parallel Speech Corpora from Dubbed Movies0
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation0
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus0
Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis0
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning0
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation0
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation0
Direct Punjabi to English speech translation using discrete units0
Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention0
Direct Speech-to-Speech Neural Machine Translation: A Survey0
Direct Speech to Speech Translation: A Review0
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features0
Direct Text to Speech Translation System using Acoustic Units0
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Hokkien→En (Two-pass decoding)ASR-BLEU (Dev)13.6Unverified
2Hokkien→En (Two-stage)ASR-BLEU (Dev)12.5Unverified
3Hokkien→En (Three-stage)ASR-BLEU (Dev)12.5Unverified
4Hokkien→En (Single-pass decoding)ASR-BLEU (Dev)8.8Unverified
5En→Hokkien (Two-pass decoding)ASR-BLEU (Dev)7.8Unverified
6En→Hokkien (Three-stage)ASR-BLEU (Dev)7.5Unverified
7En→Hokkien (Two-stage)ASR-BLEU (Dev)7.1Unverified
8En→Hokkien (Single-pass decoding)ASR-BLEU (Dev)6.6Unverified
#ModelMetricClaimedVerifiedStatus
1GenTranslateV2ASR-BLEU32.3Unverified
2GenTranslateV1ASR-BLEU30.1Unverified
3SeamlessM4T LargeV2ASR-BLEU29.4Unverified
4SeamlessM4T LargeASR-BLEU25.8Unverified
5AudioPaLM2ASR-BLEU24Unverified
6WhisperV2ASR-BLEU23.5Unverified
7SeamlessM4T MediumASR-BLEU20.4Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeASR-BLEU36.5Unverified
2SeamlessM4T MediumASR-BLEU28.1Unverified