FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 11Robust Speech Recognition via Large-Scale Weak Supervision Dec 6, 2022 Robust Speech Recognition speech-recognition
Code Code Available 8AudioLM: a Language Modeling Approach to Audio Generation Sep 7, 2022 Audio Generation
Code Code Available 7High-Fidelity Simultaneous Speech-To-Speech Translation Feb 5, 2025 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 5StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 5A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation Jun 11, 2024 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 2TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation May 28, 2024 Machine Translation speech-recognition
Code Code Available 2GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Feb 10, 2024 Machine Translation Speech-to-Speech Translation
Code Code Available 2SeamlessM4T: Massively Multilingual & Multimodal Machine Translation Aug 22, 2023 Automatic Speech Recognition Machine Translation
Code Code Available 2BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2CVSS Corpus and Massively Multilingual Speech-to-Speech Translation Jan 11, 2022 Sentence Speech-to-Speech Translation
Code Code Available 2Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech Jul 17, 2024 Speech-to-Speech Translation text-to-speech
Code Code Available 1CTC-based Non-autoregressive Textless Speech-to-Speech Translation Jun 11, 2024 Knowledge Distillation Machine Translation
Code Code Available 1EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models Dec 21, 2023 Resynthesis Speech-to-Speech Translation
Code Code Available 1AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation Dec 5, 2023 Self-Supervised Learning Speech-to-Speech Translation
Code Code Available 1DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation Oct 11, 2023 Decoder fr-en
Code Code Available 1Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation Aug 3, 2023 Decoder Quantization
Code Code Available 1Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models Jun 1, 2023 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
Code Code Available 1TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation May 25, 2022 Representation Learning Rhythm
Code Code Available 1Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation May 18, 2022 Speech-to-Speech Translation Translation
Code Code Available 1Direct speech-to-speech translation with discrete units Jul 12, 2021 Speech-to-Speech Translation Text Generation
Code Code Available 1Towards Automatic Face-to-Face Translation Mar 1, 2020 Face to Face Translation Machine Translation
Code Code Available 1Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs Jun 12, 2025 Speech-to-Speech Translation text-to-speech
— Unverified 0S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation Jun 11, 2025 Reading Comprehension Speech Synthesis
— Unverified 0Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation Jun 4, 2025 Language Modeling Language Modelling
— Unverified 0Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing May 27, 2025 Speech-to-Speech Translation Translation
— Unverified 0Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation May 21, 2025 Language Modeling Language Modelling
Code Code Available 0Language translation, and change of accent for speech-to-speech task using diffusion model May 4, 2025 Speech-to-Speech Translation Translation
— Unverified 0SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation Apr 22, 2025 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
— Unverified 0Using Phonemes in cascaded S2S translation pipeline Apr 22, 2025 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
Code Code Available 0Direct Speech to Speech Translation: A Review Mar 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus Feb 25, 2025 Speech-to-Speech Translation Translation
— Unverified 0Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM Feb 24, 2025 Automatic Speech Recognition Language Modeling
— Unverified 0Speech to Speech Translation with Translatotron: A State of the Art Review Feb 9, 2025 speech-recognition Speech Recognition
— Unverified 0A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation Feb 1, 2025 Speech-to-Speech Translation Translation
— Unverified 0Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation Dec 21, 2024 Speech-to-Speech Translation Translation
— Unverified 0Direct Speech-to-Speech Neural Machine Translation: A Survey Nov 13, 2024 Machine Translation Speech-to-Speech Translation
— Unverified 0Findings of the IWSLT 2024 Evaluation Campaign Nov 7, 2024 Speech-to-Speech Translation Translation
— Unverified 0Phonology-Guided Speech-to-Speech Translation for African Languages Oct 30, 2024 Semantic Similarity Semantic Textual Similarity
— Unverified 0Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens Oct 4, 2024 Language Modeling Language Modelling
— Unverified 0Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection Sep 17, 2024 Emotion Recognition Speech Emotion Recognition
Code Code Available 0What does it take to get state of the art in simultaneous speech-to-speech translation? Sep 2, 2024 Hallucination Management
— Unverified 0PolySinger: Singing-Voice to Singing-Voice Translation from English to Japanese Jul 19, 2024 Singing Voice Synthesis Speech-to-Speech Translation
— Unverified 0Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems Jul 18, 2024 Speech-to-Speech Translation Voice Cloning
— Unverified 0Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation Jul 8, 2024 Automatic Speech Recognition Emotion Recognition
— Unverified 0NAIST Simultaneous Speech Translation System for IWSLT 2024 Jun 30, 2024 Speech-to-Speech Translation Speech-to-Text
— Unverified 0Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation Jun 14, 2024 Speech-to-Speech Translation Translation
— Unverified 0Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? Jun 11, 2024 Contrastive Learning Speech Synthesis
— Unverified 0SimulTron: On-Device Simultaneous Speech to Speech Translation Jun 4, 2024 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
— Unverified 0