| End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data | Jun 19, 2025 | SentenceSpeech-to-Text | —Unverified | 0 |
| S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation | Jun 11, 2025 | Reading ComprehensionSpeech Synthesis | —Unverified | 0 |
| Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios | May 30, 2025 | Cross-Lingual TransferPhoneme Recognition | —Unverified | 0 |
| Improving Language and Modality Transfer in Translation by Character-level Modeling | May 30, 2025 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation System | May 29, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework | May 24, 2025 | Adversarial AttackSpeech Tokenization | CodeCode Available | 1 |
| MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONS | Apr 25, 2025 | Clinical Language TranslationMachine Translation | CodeCode Available | 1 |
| AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation | Mar 18, 2025 | DecoderSpeech-to-Text | —Unverified | 0 |
| Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision | Feb 26, 2025 | Audio SynthesisAutomatic Speech Recognition | —Unverified | 0 |
| Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM | Feb 24, 2025 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| SparQLe: Speech Queries to Text Translation Through LLMs | Feb 13, 2025 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| Speech to Speech Translation with Translatotron: A State of the Art Review | Feb 9, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation | Feb 1, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Jan 10, 2025 | Automatic Speech RecognitionClassification | CodeCode Available | 0 |
| How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? | Dec 24, 2024 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| Representation Purification for End-to-End Speech Translation | Dec 5, 2024 | Machine TranslationRhythm | —Unverified | 0 |
| Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages | Nov 11, 2024 | DecoderMachine Translation | —Unverified | 0 |
| Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? | Oct 31, 2024 | Rhythmspeech-recognition | —Unverified | 0 |
| A Survey on Speech Large Language Models | Oct 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model | Oct 24, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Unveiling the Role of Pretraining in Direct Speech Translation | Sep 26, 2024 | Automatic Speech RecognitionDecoder | —Unverified | 0 |
| LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models | Jul 22, 2024 | Data AugmentationLanguage Modeling | CodeCode Available | 1 |
| CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units | Jul 19, 2024 | Machine TranslationSpeech-to-Text | CodeCode Available | 0 |
| Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models | Jul 9, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 |
| Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation | Jul 4, 2024 | Machine Translationspeech-recognition | —Unverified | 0 |
| Investigating Decoder-only Large Language Models for Speech-to-text Translation | Jul 3, 2024 | Decoderparameter-efficient fine-tuning | —Unverified | 0 |
| NAIST Simultaneous Speech Translation System for IWSLT 2024 | Jun 30, 2024 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects | Jun 27, 2024 | Automatic Speech RecognitionMachine Translation | CodeCode Available | 0 |
| ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | Jun 26, 2024 | ArzEn Code-switched Translation to araArzEn Code-switched Translation to eng | CodeCode Available | 1 |
| SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation | Jun 20, 2024 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 |
| StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection | Jun 10, 2024 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning | Jun 5, 2024 | Automatic Speech Recognition (ASR)de-en | CodeCode Available | 5 |
| LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions | May 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Robust Semantic Communications for Speech Transmission | Mar 8, 2024 | Generative Adversarial NetworkSemantic Communication | —Unverified | 0 |
| Compact Speech Translation Models via Discrete Speech Units Pretraining | Feb 29, 2024 | DecoderSelf-Supervised Learning | —Unverified | 0 |
| Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? | Feb 19, 2024 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| Pushing the Limits of Zero-shot End-to-End Speech Translation | Feb 16, 2024 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 1 |
| Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases | Feb 1, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision | Dec 30, 2023 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| Efficient Monotonic Multihead Attention | Dec 7, 2023 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| End-to-End Speech-to-Text Translation: A Survey | Dec 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning | Nov 3, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation | Nov 1, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT | Oct 7, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 2 |
| Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach | Oct 6, 2023 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer | Oct 5, 2023 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing | Sep 27, 2023 | DecoderMachine Translation | —Unverified | 0 |
| SpeechAlign: a Framework for Speech Translation Alignment Evaluation | Sep 20, 2023 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders | Sep 14, 2023 | Contrastive LearningKnowledge Distillation | —Unverified | 0 |