SOTAVerified

Speech-to-Text Translation

Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.

Papers

Showing 51100 of 146 papers

TitleStatusHype
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text TranslationCode0
SONAR: Sentence-Level Multimodal and Language-Agnostic RepresentationsCode2
SeamlessM4T: Massively Multilingual & Multimodal Machine TranslationCode2
On decoder-only architecture for speech-to-text and large language model integration0
AudioPaLM: A Large Language Model That Can Speak and Listen0
Recent Advances in Direct Speech-to-text Translation0
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation0
Strategies for improving low resource speech to text translation relying on pre-trained ASR models0
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text TranslationCode1
DUB: Discrete Unit Back-translation for Speech TranslationCode1
Back Translation for Speech-to-text Translation Without TranscriptsCode1
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks0
ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitCode0
Enhancing Speech-to-Speech Translation with Multiple TTS Targets0
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages0
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text TranslationCode2
Pre-training for Speech Translation: CTC Meets Optimal TransportCode1
WACO: Word-Aligned Contrastive Learning for Speech TranslationCode0
M3ST: Mix at Three Levels for Speech Translation0
Efficient Speech Translation with Dynamic Latent PerceiversCode0
Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text TranslationCode0
Simple and Effective Unsupervised Speech Translation0
CTC Alignments Improve Autoregressive Translation0
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text TranslationCode0
Language Model Augmented Monotonic Attention for Simultaneous Translation0
Revisiting End-to-End Speech-to-Text Translation From ScratchCode0
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation0
Cross-modal Contrastive Learning for Speech TranslationCode1
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo LanguagesCode1
Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation0
NAIST Simultaneous Speech-to-Text Translation System for IWSLT 20220
LibriS2S: A German-English Speech-to-Speech Translation CorpusCode0
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation0
XTREME-S: Evaluating Cross-lingual Speech Representations0
STEMM: Self-learning with Speech-text Manifold Mixup for Speech TranslationCode1
A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingCode1
SHAS: Approaching optimal Segmentation for End-to-End Speech TranslationCode1
CVSS Corpus and Massively Multilingual Speech-to-Speech TranslationCode2
Regularizing End-to-End Speech Translation with Triangular Decomposition AgreementCode1
Cross-modal Contrastive Learning for Speech Translation0
Improve Sinhala Speech Recognition Through e2e LF-MMI Model0
An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting0
Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems0
Learning When to Translate for Streaming SpeechCode1
Speechformer: Reducing Information Loss in Direct Speech TranslationCode0
Infusing Future Information into Monotonic Attention Through Language ModelsCode0
Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task0
Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling0
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Task Modulation + Multitask Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU28.88Unverified
2Wav2Vec2.0+mBART+AdaptorsCase-sensitive sacreBLEU28.22Unverified
3Transformer + Meta Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU27.51Unverified
4Transformer with AdaptersCase-sensitive sacreBLEU24.63Unverified
5Dual-decoder TransformerCase-sensitive sacreBLEU23.63Unverified
6SpeechformerCase-sensitive sacreBLEU23.6Unverified
7Transformer + ASR PretrainCase-sensitive sacreBLEU22.8Unverified
8Transformer + ASR PretrainCase-sensitive sacreBLEU22.7Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersCase-sensitive sacreBLEU28.73Unverified
2SpeechformerCase-sensitive sacreBLEU28.5Unverified
3Dual-decoder TransformerCase-sensitive sacreBLEU28.12Unverified
4Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU27.4Unverified
5Transformer + ASR PretrainCase-sensitive sacreBLEU26.8Unverified
#ModelMetricClaimedVerifiedStatus
1Dual-decoder TransformerCase-sensitive sacreBLEU33.45Unverified
2Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU33.3Unverified
3Transformer + ASR PretrainCase-sensitive sacreBLEU32.3Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU30.6Unverified
2SeamlessM4T MediumBLEU26.6Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU34.1Unverified
2SeamlessM4T MediumBLEU29.8Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU21.5Unverified
2SeamlessM4T MediumBLEU19.2Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU24Unverified
2SeamlessM4T MediumBLEU20.9Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer + ASR Pretrain + SpecAugCase-insensitive sacreBLEU17.2Unverified
2Transformer + ASR PretrainCase-insensitive sacreBLEU16.5Unverified
#ModelMetricClaimedVerifiedStatus
1MediBeng Whisper TinyBleu0.98Unverified
2Whisper TinyBleu0.3Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersSacreBLEU26.61Unverified
2Dual-decoder TransformerSacreBLEU25.62Unverified
#ModelMetricClaimedVerifiedStatus
1SpeechformerCase-sensitive sacreBLEU27.7Unverified