FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 11Robust Speech Recognition via Large-Scale Weak Supervision Dec 6, 2022 Robust Speech Recognition speech-recognition
Code Code Available 8AudioLM: a Language Modeling Approach to Audio Generation Sep 7, 2022 Audio Generation
Code Code Available 7StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5High-Fidelity Simultaneous Speech-To-Speech Translation Feb 5, 2025 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 5Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 5BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation May 28, 2024 Machine Translation speech-recognition
Code Code Available 2A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation Jun 11, 2024 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 2CVSS Corpus and Massively Multilingual Speech-to-Speech Translation Jan 11, 2022 Sentence Speech-to-Speech Translation
Code Code Available 2GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Feb 10, 2024 Machine Translation Speech-to-Speech Translation
Code Code Available 2SeamlessM4T: Massively Multilingual & Multimodal Machine Translation Aug 22, 2023 Automatic Speech Recognition Machine Translation
Code Code Available 2Direct speech-to-speech translation with discrete units Jul 12, 2021 Speech-to-Speech Translation Text Generation
Code Code Available 1AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation Dec 5, 2023 Self-Supervised Learning Speech-to-Speech Translation
Code Code Available 1EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models Dec 21, 2023 Resynthesis Speech-to-Speech Translation
Code Code Available 1Towards Automatic Face-to-Face Translation Mar 1, 2020 Face to Face Translation Machine Translation
Code Code Available 1TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation May 25, 2022 Representation Learning Rhythm
Code Code Available 1Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models Jun 1, 2023 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
Code Code Available 1CTC-based Non-autoregressive Textless Speech-to-Speech Translation Jun 11, 2024 Knowledge Distillation Machine Translation
Code Code Available 1DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation Oct 11, 2023 Decoder fr-en
Code Code Available 1Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech Jul 17, 2024 Speech-to-Speech Translation text-to-speech
Code Code Available 1Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation May 18, 2022 Speech-to-Speech Translation Translation
Code Code Available 1Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation Aug 3, 2023 Decoder Quantization
Code Code Available 1Direct Speech to Speech Translation: A Review Mar 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention Oct 15, 2021 Simultaneous Speech-to-Speech Translation Speech Synthesis
— Unverified 0Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM Feb 24, 2025 Automatic Speech Recognition Language Modeling
— Unverified 0A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation Jan 25, 2023 Speech-to-Speech Translation Translation
— Unverified 0Direct Speech-to-Speech Neural Machine Translation: A Survey Nov 13, 2024 Machine Translation Speech-to-Speech Translation
— Unverified 0Direct Punjabi to English speech translation using discrete units Feb 25, 2024 Speech-to-Speech Translation Speech-to-Text
— Unverified 0Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation Jun 14, 2024 Speech-to-Speech Translation Translation
— Unverified 0AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation May 24, 2023 Speech-to-Speech Translation Translation
— Unverified 0DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation Oct 26, 2023 Image Generation Speech-to-Speech Translation
— Unverified 0Findings of the IWSLT 2024 Evaluation Campaign Nov 7, 2024 Speech-to-Speech Translation Translation
— Unverified 0Assessing Evaluation Metrics for Speech-to-Speech Translation Oct 26, 2021 Machine Translation Open-Ended Question Answering
— Unverified 0Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs Jun 12, 2025 Speech-to-Speech Translation text-to-speech
— Unverified 0i-Code Studio: A Configurable and Composable Framework for Integrative AI May 23, 2023 Question Answering Retrieval
— Unverified 0Automatic Extraction of Parallel Speech Corpora from Dubbed Movies Aug 1, 2017 Speech-to-Speech Translation Translation
— Unverified 0Evaluating MT Systems: A Theoretical Framework Feb 11, 2022 Machine Translation Speech-to-Speech Translation
— Unverified 0A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation Feb 1, 2025 Speech-to-Speech Translation Translation
— Unverified 0Enhancing Speech-to-Speech Translation with Multiple TTS Targets Apr 10, 2023 Speech-to-Speech Translation Speech-to-Text
— Unverified 0Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation Apr 6, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Enhancing expressivity transfer in textless speech-to-speech translation Oct 11, 2023 Self-Supervised Learning Speech-to-Speech Translation
— Unverified 0A Case Study on Filtering for End-to-End Speech Translation Feb 2, 2024 Speech-to-Speech Translation Speech-to-Text
— Unverified 0German-Arabic Speech-to-Speech Translation for Psychiatric Diagnosis Dec 1, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Phonology-Guided Speech-to-Speech Translation for African Languages Oct 30, 2024 Semantic Similarity Semantic Textual Similarity
— Unverified 0Findings of the IWSLT 2022 Evaluation Campaign May 1, 2022 Speech-to-Speech Translation Translation
— Unverified 0Ellipsis Translation for a Medical Speech to Speech Translation System Nov 1, 2020 Diagnostic Speech-to-Speech Translation
— Unverified 0Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training Oct 20, 2020 Sentence Simultaneous Speech-to-Speech Translation
— Unverified 0From Speech-to-Speech Translation to Automatic Dubbing Jan 19, 2020 Machine Translation Speech-to-Speech Translation
— Unverified 0CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning May 23, 2024 es-en fr-en
— Unverified 0