Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting

2025-06-06Unverified0· sign in to hype

Guillaume Wisniewski, Séverine Guillaume, Clara Rosina Fernández

Unverified — Be the first to reproduce this paper.

Abstract

Pretrained speech representations like wav2vec2 and HuBERT exhibit strong anisotropy, leading to high similarity between random embeddings. While widely observed, the impact of this property on downstream tasks remains unclear. This work evaluates anisotropy in keyword spotting for computational documentary linguistics. Using Dynamic Time Warping, we show that despite anisotropy, wav2vec2 similarity measures effectively identify words without transcription. Our results highlight the robustness of these representations, which capture phonetic structures and generalize across speakers. Our results underscore the importance of pretraining in learning rich and invariant speech representations.

Tasks

Dynamic Time Warping Keyword Spotting

Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting

Abstract

Tasks

Reproductions