SOTAVerified

Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components

2025-02-06Unverified0· sign in to hype

Jagabandhu Mishra, Manasi Chhibber, Hye-jin Shim, Tomi H. Kinnunen

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We propose an explainable probabilistic framework for characterizing spoofed speech by decomposing it into probabilistic attribute embeddings. Unlike raw high-dimensional countermeasure embeddings, which lack interpretability, the proposed probabilistic attribute embeddings aim to detect specific speech synthesizer components, represented through high-level attributes and their corresponding values. We use these probabilistic embeddings with four classifier back-ends to address two downstream tasks: spoofing detection and spoofing attack attribution. The former is the well-known bonafide-spoof detection task, whereas the latter seeks to identify the source method (generator) of a spoofed utterance. We additionally use Shapley values, a widely used technique in machine learning, to quantify the relative contribution of each attribute value to the decision-making process in each task. Results on the ASVspoof2019 dataset demonstrate the substantial role of duration and conversion modeling in spoofing detection; and waveform generation and speaker modeling in spoofing attack attribution. In the detection task, the probabilistic attribute embeddings achieve 99.7\% balanced accuracy and 0.22\% equal error rate (EER), closely matching the performance of raw embeddings (99.9\% balanced accuracy and 0.22\% EER). Similarly, in the attribution task, our embeddings achieve 90.23\% balanced accuracy and 2.07\% EER, compared to 90.16\% and 2.11\% with raw embeddings. These results demonstrate that the proposed framework is both inherently explainable by design and capable of achieving performance comparable to raw CM embeddings.

Tasks

Reproductions