Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement
2025-06-19Unverified0· sign in to hype
Tuan-Nam Nguyen, Ngoc-Quan Pham, Seymanur Aktı, Alexander Waibel
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We propose a first streaming accent conversion (AC) model that transforms non-native speech into a native-like accent while preserving speaker identity, prosody and improving pronunciation. Our approach enables stream processing by modifying a previous AC architecture with an Emformer encoder and an optimized inference mechanism. Additionally, we integrate a native text-to-speech (TTS) model to generate ideal ground-truth data for efficient training. Our streaming AC model achieves comparable performance to the top AC models while maintaining stable latency, making it the first AC system capable of streaming.