SOTAVerified

Speech-to-Text Translation

Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner.

Papers

Showing 150 of 146 papers

TitleStatusHype
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task LearningCode5
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text TranslationCode2
CVSS Corpus and Massively Multilingual Speech-to-Speech TranslationCode2
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPTCode2
SeamlessM4T: Massively Multilingual & Multimodal Machine TranslationCode2
SONAR: Sentence-Level Multimodal and Language-Agnostic RepresentationsCode2
Cross-modal Contrastive Learning for Speech TranslationCode1
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text TranslationCode1
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationCode1
Back Translation for Speech-to-text Translation Without TranscriptsCode1
End-to-end Speech Translation via Cross-modal Progressive TrainingCode1
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMsCode1
FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural NetworksCode1
End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021Code1
Regularizing End-to-End Speech Translation with Triangular Decomposition AgreementCode1
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo LanguagesCode1
DUB: Discrete Unit Back-translation for Speech TranslationCode1
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text TranslationCode1
Consecutive Decoding for Speech-to-text TranslationCode1
SHAS: Approaching optimal Segmentation for End-to-End Speech TranslationCode1
Learning When to Translate for Streaming SpeechCode1
MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONSCode1
CoVoST 2 and Massively Multilingual Speech-to-Text TranslationCode1
Pre-training for Speech Translation: CTC Meets Optimal TransportCode1
STEMM: Self-learning with Speech-text Manifold Mixup for Speech TranslationCode1
Lightweight Adapter Tuning for Multilingual Speech TranslationCode1
A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingCode1
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language ModelsCode1
Pushing the Limits of Zero-shot End-to-End Speech TranslationCode1
CoVoST: A Diverse Multilingual Speech-To-Text Translation CorpusCode1
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech TranslationCode1
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsCode1
Learning Shared Semantic Space for Speech-to-Text TranslationCode1
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech TranslationCode1
CTC Alignments Improve Autoregressive Translation0
Bridging the Modality Gap for Speech-to-Text Translation0
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing0
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation0
Cross-modal Contrastive Learning for Speech Translation0
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM0
An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting0
Enhancing Speech-to-Speech Translation with Multiple TTS Targets0
Enhancing Transformer for End-to-end Speech-to-Text Translation0
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation0
Improve Sinhala Speech Recognition Through e2e LF-MMI Model0
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation0
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation0
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Task Modulation + Multitask Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU28.88Unverified
2Wav2Vec2.0+mBART+AdaptorsCase-sensitive sacreBLEU28.22Unverified
3Transformer + Meta Learning(ASR/MT) + Data AugmentationCase-sensitive sacreBLEU27.51Unverified
4Transformer with AdaptersCase-sensitive sacreBLEU24.63Unverified
5Dual-decoder TransformerCase-sensitive sacreBLEU23.63Unverified
6SpeechformerCase-sensitive sacreBLEU23.6Unverified
7Transformer + ASR PretrainCase-sensitive sacreBLEU22.8Unverified
8Transformer + ASR PretrainCase-sensitive sacreBLEU22.7Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersCase-sensitive sacreBLEU28.73Unverified
2SpeechformerCase-sensitive sacreBLEU28.5Unverified
3Dual-decoder TransformerCase-sensitive sacreBLEU28.12Unverified
4Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU27.4Unverified
5Transformer + ASR PretrainCase-sensitive sacreBLEU26.8Unverified
#ModelMetricClaimedVerifiedStatus
1Dual-decoder TransformerCase-sensitive sacreBLEU33.45Unverified
2Transformer + ASR Pretrain + SpecAugCase-sensitive sacreBLEU33.3Unverified
3Transformer + ASR PretrainCase-sensitive sacreBLEU32.3Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU30.6Unverified
2SeamlessM4T MediumBLEU26.6Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU34.1Unverified
2SeamlessM4T MediumBLEU29.8Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU21.5Unverified
2SeamlessM4T MediumBLEU19.2Unverified
#ModelMetricClaimedVerifiedStatus
1SeamlessM4T LargeBLEU24Unverified
2SeamlessM4T MediumBLEU20.9Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer + ASR Pretrain + SpecAugCase-insensitive sacreBLEU17.2Unverified
2Transformer + ASR PretrainCase-insensitive sacreBLEU16.5Unverified
#ModelMetricClaimedVerifiedStatus
1MediBeng Whisper TinyBleu0.98Unverified
2Whisper TinyBleu0.3Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer with AdaptersSacreBLEU26.61Unverified
2Dual-decoder TransformerSacreBLEU25.62Unverified
#ModelMetricClaimedVerifiedStatus
1SpeechformerCase-sensitive sacreBLEU27.7Unverified