Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models

2025-06-17Code Available0· sign in to hype

Tuan Dat Phuong, Long-Vu Hoang, Huy Dat Tran

Code Available — Be the first to reproduce this paper.

Code

github.com/HuSTeP-Human-Speech-Text-Processing-Lab/XLSR-GRKAN-Conformer
Officialnone★ 2

Abstract

Recent advancements in speech synthesis technologies have led to increasingly advanced spoofing attacks, posing significant challenges for automatic speaker verification systems. While systems based on self-supervised learning (SSL) models, particularly the XLSR-Conformer model, have demonstrated remarkable performance in synthetic speech detection, there remains room for architectural improvements. In this paper, we propose a novel approach that replaces the traditional Multi-Layer Perceptron in the XLSR-Conformer model with a Kolmogorov-Arnold Network (KAN), a novel architecture based on the Kolmogorov-Arnold representation theorem. Our results on ASVspoof2021 demonstrate that integrating KAN into the SSL-based models can improve the performance by 60.55% relatively on LA and DF sets, further achieving 0.70% EER on the 21LA set. These findings suggest that incorporating KAN into SSL-based models is a promising direction for advances in synthetic speech detection.

Tasks

Kolmogorov-Arnold Networks Self-Supervised Learning Speaker Verification Speech Synthesis Synthetic Speech Detection

Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models

Code

Abstract

Tasks

Reproductions