FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Jul 4, 2024 Emotion Recognition Event Detection
Code Code Available 115 Robust Speech Recognition via Large-Scale Weak Supervision Dec 6, 2022 Robust Speech Recognition speech-recognition
Code Code Available 85 AudioLM: a Language Modeling Approach to Audio Generation Sep 7, 2022 Audio Generation
Code Code Available 75 StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 55 High-Fidelity Simultaneous Speech-To-Speech Translation Feb 5, 2025 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 55 Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 55 SeamlessM4T: Massively Multilingual & Multimodal Machine Translation Aug 22, 2023 Automatic Speech Recognition Machine Translation
Code Code Available 25 CVSS Corpus and Massively Multilingual Speech-to-Speech Translation Jan 11, 2022 Sentence Speech-to-Speech Translation
Code Code Available 25 GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Feb 10, 2024 Machine Translation Speech-to-Speech Translation
Code Code Available 25 A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation Jun 11, 2024 Decoder Simultaneous Speech-to-Speech Translation
Code Code Available 25 TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation May 28, 2024 Machine Translation speech-recognition
Code Code Available 25 BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Direct speech-to-speech translation with discrete units Jul 12, 2021 Speech-to-Speech Translation Text Generation
Code Code Available 15 DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation Oct 11, 2023 Decoder fr-en
Code Code Available 15 AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation Dec 5, 2023 Self-Supervised Learning Speech-to-Speech Translation
Code Code Available 15 Towards Automatic Face-to-Face Translation Mar 1, 2020 Face to Face Translation Machine Translation
Code Code Available 15 TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation May 25, 2022 Representation Learning Rhythm
Code Code Available 15 CTC-based Non-autoregressive Textless Speech-to-Speech Translation Jun 11, 2024 Knowledge Distillation Machine Translation
Code Code Available 15 Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation Aug 3, 2023 Decoder Quantization
Code Code Available 15 Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation May 18, 2022 Speech-to-Speech Translation Translation
Code Code Available 15 Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models Jun 1, 2023 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
Code Code Available 15 EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models Dec 21, 2023 Resynthesis Speech-to-Speech Translation
Code Code Available 15 Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech Jul 17, 2024 Speech-to-Speech Translation text-to-speech
Code Code Available 15 A Textless Metric for Speech-to-Speech Comparison Oct 21, 2022 Sentence Speech-to-Speech Translation
Code Code Available 05 Textless Speech-to-Speech Translation With Limited Parallel Data May 24, 2023 Automatic Speech Recognition Denoising
Code Code Available 05 Using Phonemes in cascaded S2S translation pipeline Apr 22, 2025 Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation
Code Code Available 05 UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units Dec 15, 2022 Decoder Denoising
Code Code Available 05 DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation May 22, 2024 Denoising Noise Estimation
Code Code Available 05 Dialogs Re-enacted Across Languages Nov 18, 2022 Speech-to-Speech Translation Translation
Code Code Available 05 ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit Apr 10, 2023 Benchmarking Simultaneous Speech-to-Text Translation
Code Code Available 05 Towards cross-language prosody transfer for dialog Jul 9, 2023 Speech-to-Speech Translation Translation
Code Code Available 05 ESPnet-ST: All-in-One Speech Translation Toolkit Apr 21, 2020 All Automatic Speech Recognition
Code Code Available 05 Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022 May 1, 2022 Decoder Knowledge Distillation
Code Code Available 05 Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation May 21, 2025 Language Modeling Language Modelling
Code Code Available 05 LibriS2S: A German-English Speech-to-Speech Translation Corpus Apr 22, 2022 Speech-to-Speech Translation Speech-to-Text
Code Code Available 05 Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection Sep 17, 2024 Emotion Recognition Speech Emotion Recognition
Code Code Available 05 Direct speech-to-speech translation with a sequence-to-sequence model Apr 12, 2019 Speech Synthesis Speech-to-Speech Translation
Code Code Available 05 Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation Oct 31, 2022 Speech-to-Speech Translation Translation
Code Code Available 05 Multimodal and Multilingual Embeddings for Large-Scale Speech Mining Dec 1, 2021 Speech-to-Speech Translation Translation
Code Code Available 05 Direct Speech to Speech Translation: A Review Mar 3, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 00 Direct Speech-to-Speech Neural Machine Translation: A Survey Nov 13, 2024 Machine Translation Speech-to-Speech Translation
— Unverified 00 Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention Oct 15, 2021 Simultaneous Speech-to-Speech Translation Speech Synthesis
— Unverified 00 Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM Feb 24, 2025 Automatic Speech Recognition Language Modeling
— Unverified 00 Direct Punjabi to English speech translation using discrete units Feb 25, 2024 Speech-to-Speech Translation Speech-to-Text
— Unverified 00 Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation Jun 14, 2024 Speech-to-Speech Translation Translation
— Unverified 00 AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation May 24, 2023 Speech-to-Speech Translation Translation
— Unverified 00 From Speech-to-Speech Translation to Automatic Dubbing Jan 19, 2020 Machine Translation Speech-to-Speech Translation
— Unverified 00 Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training Oct 20, 2020 Sentence Simultaneous Speech-to-Speech Translation
— Unverified 00 Findings of the IWSLT 2024 Evaluation Campaign Nov 7, 2024 Speech-to-Speech Translation Translation
— Unverified 00 DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation Oct 26, 2023 Image Generation Speech-to-Speech Translation
— Unverified 00