Speech-to-Speech Translation

Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 117 papers

Title	Date	Tasks	Status
SimulTron: On-Device Simultaneous Speech to Speech Translation	Jun 4, 2024	Simultaneous Speech-to-Speech TranslationSpeech-to-Speech Translation	—Unverified
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing	Jun 4, 2024	DecoderLanguage Modeling	—Unverified
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought	May 30, 2024	Language ModelingLanguage Modelling	—Unverified
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning	May 23, 2024	es-enfr-en	—Unverified
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation	May 22, 2024	DenoisingNoise Estimation	CodeCode Available
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation	Mar 19, 2024	DecoderLanguage Modeling	—Unverified
Direct Punjabi to English speech translation using discrete units	Feb 25, 2024	Speech-to-Speech TranslationSpeech-to-Text	—Unverified
A Case Study on Filtering for End-to-End Speech Translation	Feb 2, 2024	Speech-to-Speech TranslationSpeech-to-Text	—Unverified
TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data	Jan 17, 2024	SentenceSpeech-to-Speech Translation	—Unverified
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation	Dec 23, 2023	es-enfr-en	—Unverified
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation	Oct 26, 2023	Image GenerationSpeech-to-Speech Translation	—Unverified
Enhancing expressivity transfer in textless speech-to-speech translation	Oct 11, 2023	Self-Supervised LearningSpeech-to-Speech Translation	—Unverified
Direct Text to Speech Translation System using Acoustic Units	Sep 14, 2023	DecoderSpeech-to-Speech Translation	—Unverified
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer	Sep 14, 2023	In-Context LearningLanguage Modeling	—Unverified
Multilingual Speech-to-Speech Translation into Multiple Target Languages	Jul 17, 2023	Language IdentificationSpeech-to-Speech Translation	—Unverified
Towards cross-language prosody transfer for dialog	Jul 9, 2023	Speech-to-Speech TranslationTranslation	CodeCode Available
AudioPaLM: A Large Language Model That Can Speak and Listen	Jun 22, 2023	Language ModelingLanguage Modelling	—Unverified
PolyVoice: Language Models for Speech to Speech Translation	Jun 5, 2023	Language ModelingLanguage Modelling	—Unverified
Translatotron 3: Speech to Speech Translation with Monolingual Data	May 27, 2023	Speech-to-Speech TranslationTranslation	—Unverified
Textless Speech-to-Speech Translation With Limited Parallel Data	May 24, 2023	Automatic Speech RecognitionDenoising	CodeCode Available
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation	May 24, 2023	Speech-to-Speech TranslationTranslation	—Unverified
i-Code Studio: A Configurable and Composable Framework for Integrative AI	May 23, 2023	Question AnsweringRetrieval	—Unverified
Duplex Diffusion Models Improve Speech-to-Speech Translation	May 22, 2023	Speech-to-Speech TranslationTranslation	—Unverified
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit	Apr 10, 2023	BenchmarkingSimultaneous Speech-to-Text Translation	—Unverified
Enhancing Speech-to-Speech Translation with Multiple TTS Targets	Apr 10, 2023	Speech-to-Speech TranslationSpeech-to-Text	—Unverified
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation	Jan 25, 2023	Speech-to-Speech TranslationTranslation	—Unverified
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units	Dec 15, 2022	DecoderDenoising	—Unverified
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features	Dec 12, 2022	Speech-to-Speech TranslationTranslation	—Unverified
Dialogs Re-enacted Across Languages	Nov 18, 2022	Speech-to-Speech TranslationTranslation	CodeCode Available
Speech-to-Speech Translation For A Real-world Unwritten Language	Nov 11, 2022	Speech-to-Speech TranslationTranslation	—Unverified
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations	Nov 8, 2022	Mixture-of-ExpertsSpeech-to-Speech Translation	—Unverified
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation	Oct 31, 2022	Speech-to-Speech TranslationTranslation	—Unverified
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation	Oct 31, 2022	Speech-to-Speech TranslationTranslation	—Unverified
Improving Speech-to-Speech Translation Through Unlabeled Text	Oct 26, 2022	Machine Translationspeech-recognition	—Unverified
A Textless Metric for Speech-to-Speech Comparison	Oct 21, 2022	SentenceSpeech-to-Speech Translation	CodeCode Available
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling	Sep 30, 2022	Language ModelingLanguage Modelling	—Unverified
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation	May 17, 2022	Representation LearningRetrieval	—Unverified
Findings of the IWSLT 2022 Evaluation Campaign	May 1, 2022	Speech-to-Speech TranslationTranslation	—Unverified
Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022	May 1, 2022	DecoderKnowledge Distillation	CodeCode Available
The HW-TSC’s Speech to Speech Translation System for IWSLT 2022 Evaluation	May 1, 2022	Machine TranslationReranking	—Unverified
MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks	May 1, 2022	Simultaneous Speech-to-Text TranslationSpeech-to-Speech Translation	—Unverified
LibriS2S: A German-English Speech-to-Speech Translation Corpus	Apr 22, 2022	Speech-to-Speech TranslationSpeech-to-Text	CodeCode Available
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation	Apr 6, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Prosodic Alignment for off-screen automatic dubbing	Apr 6, 2022	Speech-to-Speech TranslationTranslation	—Unverified
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation	Mar 24, 2022	Representation LearningSpeech Representation Learning	—Unverified
Evaluating MT Systems: A Theoretical Framework	Feb 11, 2022	Machine TranslationSpeech-to-Speech Translation	—Unverified
Textless Speech-to-Speech Translation on Real Data	Dec 15, 2021	Speech-to-Speech TranslationTranslation	—Unverified
Multimodal and Multilingual Embeddings for Large-Scale Speech Mining	Dec 1, 2021	Speech-to-Speech TranslationTranslation	—Unverified
Assessing Evaluation Metrics for Speech-to-Speech Translation	Oct 26, 2021	Machine TranslationOpen-Ended Question Answering	—Unverified
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation	Oct 15, 2021	Data AugmentationSimultaneous Speech-to-Speech Translation	—Unverified

Show:10 25 50

← PrevPage 2 of 3Next →

All datasets TAT FLEURS X-eng CVSS

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Hokkien→En (Two-pass decoding)	ASR-BLEU (Dev)	13.6	—	Unverified
2	Hokkien→En (Two-stage)	ASR-BLEU (Dev)	12.5	—	Unverified
3	Hokkien→En (Three-stage)	ASR-BLEU (Dev)	12.5	—	Unverified
4	Hokkien→En (Single-pass decoding)	ASR-BLEU (Dev)	8.8	—	Unverified
5	En→Hokkien (Two-pass decoding)	ASR-BLEU (Dev)	7.8	—	Unverified
6	En→Hokkien (Three-stage)	ASR-BLEU (Dev)	7.5	—	Unverified
7	En→Hokkien (Two-stage)	ASR-BLEU (Dev)	7.1	—	Unverified
8	En→Hokkien (Single-pass decoding)	ASR-BLEU (Dev)	6.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GenTranslateV2	ASR-BLEU	32.3	—	Unverified
2	GenTranslateV1	ASR-BLEU	30.1	—	Unverified
3	SeamlessM4T LargeV2	ASR-BLEU	29.4	—	Unverified
4	SeamlessM4T Large	ASR-BLEU	25.8	—	Unverified
5	AudioPaLM2	ASR-BLEU	24	—	Unverified
6	WhisperV2	ASR-BLEU	23.5	—	Unverified
7	SeamlessM4T Medium	ASR-BLEU	20.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SeamlessM4T Large	ASR-BLEU	36.5	—	Unverified
2	SeamlessM4T Medium	ASR-BLEU	28.1	—	Unverified