SOTAVerified

Speech-to-Text Translation

Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.

Papers

Showing 51100 of 146 papers

TitleStatusHype
Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages0
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?0
A Survey on Speech Large Language Models0
Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model0
Unveiling the Role of Pretraining in Direct Speech Translation0
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation UnitsCode0
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation0
Investigating Decoder-only Large Language Models for Speech-to-text Translation0
NAIST Simultaneous Speech Translation System for IWSLT 20240
Voices Unheard: NLP Resources and Models for Yorùbá Regional DialectsCode0
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech TranslationCode0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History SelectionCode0
Robust Semantic Communications for Speech Transmission0
Compact Speech Translation Models via Discrete Speech Units Pretraining0
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?0
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases0
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak SupervisionCode0
Efficient Monotonic Multihead Attention0
End-to-End Speech-to-Text Translation: A Survey0
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning0
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach0
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer0
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing0
SpeechAlign: a Framework for Speech Translation Alignment Evaluation0
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders0
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text TranslationCode0
On decoder-only architecture for speech-to-text and large language model integration0
AudioPaLM: A Large Language Model That Can Speak and Listen0
Recent Advances in Direct Speech-to-text Translation0
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation0
Strategies for improving low resource speech to text translation relying on pre-trained ASR models0
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks0
Enhancing Speech-to-Speech Translation with Multiple TTS Targets0
ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitCode0
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages0
WACO: Word-Aligned Contrastive Learning for Speech TranslationCode0
M3ST: Mix at Three Levels for Speech Translation0
Efficient Speech Translation with Dynamic Latent PerceiversCode0
Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text TranslationCode0
Simple and Effective Unsupervised Speech Translation0
CTC Alignments Improve Autoregressive Translation0
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text TranslationCode0
Language Model Augmented Monotonic Attention for Simultaneous Translation0
Revisiting End-to-End Speech-to-Text Translation From ScratchCode0
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation0
Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation0
NAIST Simultaneous Speech-to-Text Translation System for IWSLT 20220
LibriS2S: A German-English Speech-to-Speech Translation CorpusCode0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Task Modulation + Multitask Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU28.88Unverified
2Wav2Vec2.0+mBART+AdaptorsCase-sensitive sacreBLEU28.22Unverified
3Transformer + Meta Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU27.51Unverified
4Transformer with AdaptersCase-sensitive sacreBLEU24.63Unverified
5Dual-decoder TransformerCase-sensitive sacreBLEU23.63Unverified
6SpeechformerCase-sensitive sacreBLEU23.6Unverified
7Transformer + ASR PretrainCase-sensitive sacreBLEU22.8Unverified
8Transformer + ASR PretrainCase-sensitive sacreBLEU22.7Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersCase-sensitive sacreBLEU28.73Unverified
2SpeechformerCase-sensitive sacreBLEU28.5Unverified
3Dual-decoder TransformerCase-sensitive sacreBLEU28.12Unverified
4Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU27.4Unverified
5Transformer + ASR PretrainCase-sensitive sacreBLEU26.8Unverified
#ModelMetricClaimedVerifiedStatus
1Dual-decoder TransformerCase-sensitive sacreBLEU33.45Unverified
2Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU33.3Unverified
3Transformer + ASR PretrainCase-sensitive sacreBLEU32.3Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU30.6Unverified
2SeamlessM4T MediumBLEU26.6Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU34.1Unverified
2SeamlessM4T MediumBLEU29.8Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU21.5Unverified
2SeamlessM4T MediumBLEU19.2Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU24Unverified
2SeamlessM4T MediumBLEU20.9Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer + ASR Pretrain + SpecAugCase-insensitive sacreBLEU17.2Unverified
2Transformer + ASR PretrainCase-insensitive sacreBLEU16.5Unverified
#ModelMetricClaimedVerifiedStatus
1MediBeng Whisper TinyBleu0.98Unverified
2Whisper TinyBleu0.3Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersSacreBLEU26.61Unverified
2Dual-decoder TransformerSacreBLEU25.62Unverified
#ModelMetricClaimedVerifiedStatus
1SpeechformerCase-sensitive sacreBLEU27.7Unverified