SOTAVerified

Speech-to-Speech Translation

Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging.

Papers

Showing 2650 of 117 papers

TitleStatusHype
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMsCode11
NAIST Simultaneous Speech Translation System for IWSLT 20240
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation0
CTC-based Non-autoregressive Textless Speech-to-Speech TranslationCode1
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech TranslationCode2
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task LearningCode5
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing0
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation0
SimulTron: On-Device Simultaneous Speech to Speech Translation0
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought0
TransVIP: Speech to Speech Translation System with Voice and Isochrony PreservationCode2
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning0
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech TranslationCode0
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation0
Direct Punjabi to English speech translation using discrete units0
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine TranslatorsCode2
A Case Study on Filtering for End-to-End Speech Translation0
TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data0
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation0
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech ModelsCode1
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech RepresentationCode1
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation0
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech TranslationCode1
Enhancing expressivity transfer in textless speech-to-speech translation0
Show:102550
← PrevPage 2 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Hokkien→En (Two-pass decoding)ASR-BLEU (Dev)13.6Unverified
2Hokkien→En (Three-stage)ASR-BLEU (Dev)12.5Unverified
3Hokkien→En (Two-stage)ASR-BLEU (Dev)12.5Unverified
4Hokkien→En (Single-pass decoding)ASR-BLEU (Dev)8.8Unverified
5En→Hokkien (Two-pass decoding)ASR-BLEU (Dev)7.8Unverified
6En→Hokkien (Three-stage)ASR-BLEU (Dev)7.5Unverified
7En→Hokkien (Two-stage)ASR-BLEU (Dev)7.1Unverified
8En→Hokkien (Single-pass decoding)ASR-BLEU (Dev)6.6Unverified
#ModelMetricClaimedVerifiedStatus
1GenTranslateV2ASR-BLEU32.3Unverified
2GenTranslateV1ASR-BLEU30.1Unverified
3SeamlessM4T LargeV2ASR-BLEU29.4Unverified
4SeamlessM4T LargeASR-BLEU25.8Unverified
5AudioPaLM2ASR-BLEU24Unverified
6WhisperV2ASR-BLEU23.5Unverified
7SeamlessM4T MediumASR-BLEU20.4Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeASR-BLEU36.5Unverified
2SeamlessM4T MediumASR-BLEU28.1Unverified