SOTAVerified

Speech-to-Text Translation

Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.

Papers

Showing 150 of 146 papers

TitleStatusHype
End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation0
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios0
Improving Language and Modality Transfer in Translation by Character-level Modeling0
BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation SystemCode0
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONSCode1
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation0
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision0
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM0
SparQLe: Speech Queries to Text Translation Through LLMsCode0
Speech to Speech Translation with Translatotron: A State of the Art Review0
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?0
Representation Purification for End-to-End Speech Translation0
Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages0
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?0
A Survey on Speech Large Language Models0
Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model0
Unveiling the Role of Pretraining in Direct Speech Translation0
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language ModelsCode1
CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation UnitsCode0
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language ModelsCode0
Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation0
Investigating Decoder-only Large Language Models for Speech-to-text Translation0
NAIST Simultaneous Speech Translation System for IWSLT 20240
Voices Unheard: NLP Resources and Models for Yorùbá Regional DialectsCode0
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech TranslationCode0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History SelectionCode0
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task LearningCode5
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsCode1
Robust Semantic Communications for Speech Transmission0
Compact Speech Translation Models via Discrete Speech Units Pretraining0
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?0
Pushing the Limits of Zero-shot End-to-End Speech TranslationCode1
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases0
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak SupervisionCode0
Efficient Monotonic Multihead Attention0
End-to-End Speech-to-Text Translation: A Survey0
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning0
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationCode1
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach0
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer0
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing0
SpeechAlign: a Framework for Speech Translation Alignment Evaluation0
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Task Modulation + Multitask Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU28.88Unverified
2Wav2Vec2.0+mBART+AdaptorsCase-sensitive sacreBLEU28.22Unverified
3Transformer + Meta Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU27.51Unverified
4Transformer with AdaptersCase-sensitive sacreBLEU24.63Unverified
5Dual-decoder TransformerCase-sensitive sacreBLEU23.63Unverified
6SpeechformerCase-sensitive sacreBLEU23.6Unverified
7Transformer + ASR PretrainCase-sensitive sacreBLEU22.8Unverified
8Transformer + ASR PretrainCase-sensitive sacreBLEU22.7Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersCase-sensitive sacreBLEU28.73Unverified
2SpeechformerCase-sensitive sacreBLEU28.5Unverified
3Dual-decoder TransformerCase-sensitive sacreBLEU28.12Unverified
4Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU27.4Unverified
5Transformer + ASR PretrainCase-sensitive sacreBLEU26.8Unverified
#ModelMetricClaimedVerifiedStatus
1Dual-decoder TransformerCase-sensitive sacreBLEU33.45Unverified
2Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU33.3Unverified
3Transformer + ASR PretrainCase-sensitive sacreBLEU32.3Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU30.6Unverified
2SeamlessM4T MediumBLEU26.6Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU34.1Unverified
2SeamlessM4T MediumBLEU29.8Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU21.5Unverified
2SeamlessM4T MediumBLEU19.2Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU24Unverified
2SeamlessM4T MediumBLEU20.9Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer + ASR Pretrain + SpecAugCase-insensitive sacreBLEU17.2Unverified
2Transformer + ASR PretrainCase-insensitive sacreBLEU16.5Unverified
#ModelMetricClaimedVerifiedStatus
1MediBeng Whisper TinyBleu0.98Unverified
2Whisper TinyBleu0.3Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersSacreBLEU26.61Unverified
2Dual-decoder TransformerSacreBLEU25.62Unverified
#ModelMetricClaimedVerifiedStatus
1SpeechformerCase-sensitive sacreBLEU27.7Unverified