| AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation | Apr 29, 2025 | In-Context LearningSpeech Synthesis | —Unverified | 0 |
| DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis | Oct 14, 2024 | DenoisingSpeaker Verification | —Unverified | 0 |
| Disentangling segmental and prosodic factors to non-native speech comprehensibility | Aug 20, 2024 | QuantizationVoice Similarity | —Unverified | 0 |
| VoxSim: A perceptual voice similarity dataset | Jul 26, 2024 | BenchmarkingSpeaker Recognition | CodeCode Available | 1 |
| SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models | Jun 12, 2024 | Voice ConversionVoice Similarity | —Unverified | 0 |
| Singer Identity Representation Learning using Self-Supervised Techniques | Jan 10, 2024 | Domain GeneralizationRepresentation Learning | CodeCode Available | 2 |
| YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone | Dec 4, 2021 | Speech SynthesisText-To-Speech Synthesis | CodeCode Available | 1 |
| SVSNet: An End-to-end Speaker Voice Similarity Assessment Model | Jul 20, 2021 | Voice ConversionVoice Similarity | CodeCode Available | 0 |
| DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion | May 28, 2021 | DenoisingVoice Conversion | —Unverified | 0 |
| An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion | Apr 25, 2021 | Generative Adversarial NetworkSpeech Synthesis | —Unverified | 0 |