| Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information | Dec 7, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction | Oct 28, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach | Oct 25, 2022 | Representation LearningSpeaker Recognition | —Unverified | 0 | 0 |
| Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement | Nov 12, 2022 | Data AugmentationEmotion Recognition | —Unverified | 0 | 0 |
| Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation | Jun 17, 2019 | ClusteringRepresentation Learning | —Unverified | 0 | 0 |
| INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition | May 25, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| JOOCI: a Framework for Learning Comprehensive Speech Representations | Oct 14, 2024 | Representation LearningSpeech Representation Learning | —Unverified | 0 | 0 |
| Label Aware Speech Representation Learning For Language Identification | Jun 7, 2023 | Language IdentificationMissing Labels | —Unverified | 0 | 0 |
| Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks | Mar 9, 2022 | Representation Learningspeech-recognition | —Unverified | 0 | 0 |
| A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning | Jun 3, 2020 | Representation LearningSelf-Supervised Learning | —Unverified | 0 | 0 |
| Learning Cross-lingual Visual Speech Representations | Mar 14, 2023 | Representation LearningSelf-Supervised Learning | —Unverified | 0 | 0 |
| Learning Disentangled Speech Representations | Nov 4, 2023 | BenchmarkingDisentanglement | —Unverified | 0 | 0 |
| Learning Robust and Multilingual Speech Representations | Jan 29, 2020 | Representation Learningspeech-recognition | —Unverified | 0 | 0 |
| Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation | Mar 24, 2022 | Representation LearningSpeech Representation Learning | —Unverified | 0 | 0 |
| Towards the Next Frontier in Speech Representation Learning Using Disentanglement | Jul 2, 2024 | DisentanglementRepresentation Learning | —Unverified | 0 | 0 |
| MASR: Multi-label Aware Speech Representation | Jul 20, 2023 | Emotion RecognitionLanguage Identification | —Unverified | 0 | 0 |
| Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning | Oct 28, 2019 | ClusteringPhoneme Recognition | —Unverified | 0 | 0 |
| VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning | Nov 21, 2022 | Audio-Visual Speech RecognitionLanguage Modelling | —Unverified | 0 | 0 |
| On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation | Jul 6, 2023 | Keyword SpottingKnowledge Distillation | —Unverified | 0 | 0 |
| On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding | Oct 11, 2022 | Representation LearningSentence | —Unverified | 0 | 0 |
| PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition | Jun 10, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Privacy-preserving Representation Learning for Speech Understanding | Oct 26, 2023 | ClassificationEmotion Recognition | —Unverified | 0 | 0 |
| Privacy-Preserving Speech Representation Learning using Vector Quantization | Mar 15, 2022 | Privacy PreservingQuantization | —Unverified | 0 | 0 |
| Progressive Residual Extraction based Pre-training for Speech Representation Learning | Aug 31, 2024 | Emotion RecognitionRepresentation Learning | —Unverified | 0 | 0 |
| TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition | May 23, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion | Nov 14, 2023 | Deep LearningDiversity | —Unverified | 0 | 0 |
| Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective | Jan 16, 2024 | Representation LearningSelf-Supervised Learning | —Unverified | 0 | 0 |
| A Brief Overview of Unsupervised Neural Speech Representation Learning | Mar 1, 2022 | Representation LearningSpeech Representation Learning | —Unverified | 0 | 0 |
| Wav2vec-C: A Self-supervised Model for Speech Representation Learning | Mar 9, 2021 | QuantizationRepresentation Learning | —Unverified | 0 | 0 |
| A Comparison of Discrete Latent Variable Models for Speech Representation Learning | Oct 24, 2020 | Phoneme RecognitionRepresentation Learning | —Unverified | 0 | 0 |
| Robust Speaker Recognition with Transformers Using wav2vec 2.0 | Mar 28, 2022 | Data AugmentationRepresentation Learning | —Unverified | 0 | 0 |
| Robust Speech Representation Learning via Flow-based Embedding Regularization | Dec 7, 2021 | Deep LearningLanguage Identification | —Unverified | 0 | 0 |
| SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation | May 17, 2022 | Representation LearningRetrieval | —Unverified | 0 | 0 |
| Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound | Aug 14, 2020 | Contrastive LearningGaze Prediction | —Unverified | 0 | 0 |
| Self-supervised models of audio effectively explain human cortical responses to speech | May 27, 2022 | Representation LearningSpeech Representation Learning | —Unverified | 0 | 0 |
| Self-Supervised Speech Representation Learning: A Review | May 21, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Self-supervised speech representation learning for keyword-spotting with light-weight transformers | Mar 7, 2023 | Keyword SpottingRepresentation Learning | —Unverified | 0 | 0 |
| UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization | Jan 26, 2024 | DecoderDomain Adaptation | —Unverified | 0 | 0 |
| Similarity Analysis of Self-Supervised Speech Representations | Oct 22, 2020 | Representation LearningSpeech Representation Learning | —Unverified | 0 | 0 |
| Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System | Jun 5, 2023 | Multi-Task LearningRepresentation Learning | —Unverified | 0 | 0 |
| Universal Semantic Disentangled Privacy-preserving Speech Representation Learning | May 19, 2025 | DecoderPrivacy Preserving | —Unverified | 0 | 0 |
| Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio | Oct 17, 2023 | Representation LearningSelf-Supervised Learning | —Unverified | 0 | 0 |
| Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods | Jul 25, 2023 | MULTI-VIEW LEARNINGRepresentation Learning | —Unverified | 0 | 0 |
| Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation | Aug 20, 2024 | Data AugmentationRepresentation Learning | —Unverified | 0 | 0 |
| Adversarially learning disentangled speech representations for robust multi-factor voice conversion | Jan 30, 2021 | Representation LearningRhythm | —Unverified | 0 | 0 |
| An empirical study on speech restoration guided by self supervised speech representation | May 30, 2023 | Representation LearningSpeech Representation Learning | —Unverified | 0 | 0 |
| A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition | Jan 22, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning | Oct 18, 2021 | Multi-Task LearningRepresentation Learning | —Unverified | 0 | 0 |
| Application of Knowledge Distillation to Multi-task Speech Representation Learning | Oct 29, 2022 | Keyword SpottingKnowledge Distillation | —Unverified | 0 | 0 |
| Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models | Sep 21, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 | 0 |