| M3ST: Mix at Three Levels for Speech Translation | Dec 7, 2022 | Data AugmentationDiversity | —Unverified | 0 |
| MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation | Oct 22, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer | Oct 5, 2023 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022 | May 1, 2022 | SegmentationSimultaneous Speech-to-Text Translation | —Unverified | 0 |
| NAIST Simultaneous Speech Translation System for IWSLT 2024 | Jun 30, 2024 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| NeurST: Neural Speech Translation Toolkit | Dec 18, 2020 | Speech-to-Text TranslationTranslation | —Unverified | 0 |
| Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision | Feb 26, 2025 | Audio SynthesisAutomatic Speech Recognition | —Unverified | 0 |
| On decoder-only architecture for speech-to-text and large language model integration | Jul 8, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| XTREME-S: Evaluating Cross-lingual Speech Representations | Mar 21, 2022 | Representation LearningRetrieval | —Unverified | 0 |
| A Comparative Study on End-to-end Speech to Text Translation | Nov 20, 2019 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation | Mar 18, 2025 | DecoderSpeech-to-Text | —Unverified | 0 |
| Analyzing ASR pretraining for low-resource speech-to-text translation | Oct 23, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting | Dec 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Survey on Speech Large Language Models | Oct 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| AudioPaLM: A Large Language Model That Can Speak and Listen | Jun 22, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Feb 24, 2025 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| Bridging the Modality Gap for Speech-to-Text Translation | Oct 28, 2020 | DecoderSpeech-to-Text | —Unverified | 0 |
| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 |
| Cross-lingual topic prediction for speech using translations | Aug 29, 2019 | HumanitarianPrediction | —Unverified | 0 |
| CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders | Sep 14, 2023 | Contrastive LearningKnowledge Distillation | —Unverified | 0 |
| Compact Speech Translation Models via Discrete Speech Units Pretraining | Feb 29, 2024 | DecoderSelf-Supervised Learning | —Unverified | 0 |
| Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model | Oct 24, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Contextualized Translation of Automatically Segmented Speech | Aug 5, 2020 | SegmentationSentence | —Unverified | 0 |
| COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning | Nov 3, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Cross-modal Contrastive Learning for Speech Translation | Dec 17, 2021 | Contrastive LearningRetrieval | —Unverified | 0 |
| Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing | Sep 27, 2023 | DecoderMachine Translation | —Unverified | 0 |
| CTC Alignments Improve Autoregressive Translation | Oct 11, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning | Nov 11, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems | Oct 13, 2021 | SentenceSimultaneous Speech-to-Text Translation | —Unverified | 0 |
| Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR | Jun 11, 2021 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| Efficient Monotonic Multihead Attention | Dec 7, 2023 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| End-to-End Offline Speech Translation System for IWSLT 2020 using Modality Agnostic Meta-Learning | Jul 1, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Speech-to-Text Translation: A Survey | Dec 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data | Jun 19, 2025 | SentenceSpeech-to-Text | —Unverified | 0 |
| Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation | Apr 6, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Enhancing Speech-to-Speech Translation with Multiple TTS Targets | Apr 10, 2023 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Enhancing Transformer for End-to-end Speech-to-Text Translation | Aug 1, 2019 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit | Apr 10, 2023 | BenchmarkingSimultaneous Speech-to-Text Translation | —Unverified | 0 |
| Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates | Nov 8, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation | Jul 4, 2024 | Machine Translationspeech-recognition | —Unverified | 0 |
| Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages | Mar 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? | Dec 24, 2024 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks | May 4, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Improved Cross-Lingual Transfer Learning For Automatic Speech Translation | Jun 1, 2023 | automatic-speech-translationCross-Lingual Transfer | —Unverified | 0 |
| SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation | Jun 20, 2024 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| SimulSpeech: End-to-End Simultaneous Speech to Text Translation | Jul 1, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| SpeechAlign: a Framework for Speech Translation Alignment Evaluation | Sep 20, 2023 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? | Oct 31, 2024 | Rhythmspeech-recognition | —Unverified | 0 |
| Speech to Speech Translation with Translatotron: A State of the Art Review | Feb 9, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios | May 30, 2025 | Cross-Lingual TransferPhoneme Recognition | —Unverified | 0 |