Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs Jun 12, 2025 Speech-to-Speech Translation text-to-speech
— Unverified 0S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation Jun 11, 2025 Reading Comprehension Speech Synthesis
— Unverified 0Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation Jun 4, 2025 Language Modeling Language Modelling
— Unverified 0Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing May 27, 2025 Speech-to-Speech Translation Translation
— Unverified 0Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation May 21, 2025 Language Modeling Language Modelling
Code Code Available 0Language translation, and change of accent for speech-to-speech task using diffusion model May 4, 2025 Speech-to-Speech Translation Translation
— Unverified 0Using Phonemes in cascaded S2S translation pipeline Apr 22, 2025 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
Code Code Available 0SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation Apr 22, 2025 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
— Unverified 0Direct Speech to Speech Translation: A Review Mar 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus Feb 25, 2025 Speech-to-Speech Translation Translation
— Unverified 0Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM Feb 24, 2025 Automatic Speech Recognition Language Modeling
— Unverified 0Speech to Speech Translation with Translatotron: A State of the Art Review Feb 9, 2025 speech-recognition Speech Recognition
— Unverified 0High-Fidelity Simultaneous Speech-To-Speech Translation Feb 5, 2025 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 5A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation Feb 1, 2025 Speech-to-Speech Translation Translation
— Unverified 0Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation Dec 21, 2024 Speech-to-Speech Translation Translation
— Unverified 0Direct Speech-to-Speech Neural Machine Translation: A Survey Nov 13, 2024 Machine Translation Speech-to-Speech Translation
— Unverified 0Findings of the IWSLT 2024 Evaluation Campaign Nov 7, 2024 Speech-to-Speech Translation Translation
— Unverified 0Phonology-Guided Speech-to-Speech Translation for African Languages Oct 30, 2024 Semantic Similarity Semantic Textual Similarity
— Unverified 0Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens Oct 4, 2024 Language Modeling Language Modelling
— Unverified 0Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection Sep 17, 2024 Emotion Recognition Speech Emotion Recognition
Code Code Available 0What does it take to get state of the art in simultaneous speech-to-speech translation? Sep 2, 2024 Hallucination Management
— Unverified 0PolySinger: Singing-Voice to Singing-Voice Translation from English to Japanese Jul 19, 2024 Singing Voice Synthesis Speech-to-Speech Translation
— Unverified 0Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems Jul 18, 2024 Speech-to-Speech Translation Voice Cloning
— Unverified 0Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech Jul 17, 2024 Speech-to-Speech Translation text-to-speech
Code Code Available 1Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation Jul 8, 2024 Automatic Speech Recognition Emotion Recognition
— Unverified 0FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 11NAIST Simultaneous Speech Translation System for IWSLT 2024 Jun 30, 2024 Speech-to-Speech Translation Speech-to-Text
— Unverified 0Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation Jun 14, 2024 Speech-to-Speech Translation Translation
— Unverified 0CTC-based Non-autoregressive Textless Speech-to-Speech Translation Jun 11, 2024 Knowledge Distillation Machine Translation
Code Code Available 1A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation Jun 11, 2024 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 2Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? Jun 11, 2024 Contrastive Learning Speech Synthesis
— Unverified 0StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing Jun 4, 2024 Decoder Language Modeling
— Unverified 0Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation Jun 4, 2024 Speech-to-Speech Translation Translation
— Unverified 0SimulTron: On-Device Simultaneous Speech to Speech Translation Jun 4, 2024 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
— Unverified 0SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought May 30, 2024 Language Modeling Language Modelling
— Unverified 0TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation May 28, 2024 Machine Translation speech-recognition
Code Code Available 2CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning May 23, 2024 es-en fr-en
— Unverified 0DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation May 22, 2024 Denoising Noise Estimation
Code Code Available 0MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation Mar 19, 2024 Decoder Language Modeling
— Unverified 0Direct Punjabi to English speech translation using discrete units Feb 25, 2024 Speech-to-Speech Translation Speech-to-Text
— Unverified 0GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Feb 10, 2024 Machine Translation Speech-to-Speech Translation
Code Code Available 2A Case Study on Filtering for End-to-End Speech Translation Feb 2, 2024 Speech-to-Speech Translation Speech-to-Text
— Unverified 0TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data Jan 17, 2024 Sentence Speech-to-Speech Translation
— Unverified 0TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation Dec 23, 2023 es-en fr-en
— Unverified 0EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models Dec 21, 2023 Resynthesis Speech-to-Speech Translation
Code Code Available 1AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation Dec 5, 2023 Self-Supervised Learning Speech-to-Speech Translation
Code Code Available 1DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation Oct 26, 2023 Image Generation Speech-to-Speech Translation
— Unverified 0DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation Oct 11, 2023 Decoder fr-en
Code Code Available 1Enhancing expressivity transfer in textless speech-to-speech translation Oct 11, 2023 Self-Supervised Learning Speech-to-Speech Translation
— Unverified 0