Data Efficient Child-Adult Speaker Diarization with Simulated Conversations

2024-09-13Code Available1· sign in to hype

Anfeng Xu, Tiantian Feng, Helen Tager-Flusberg, Catherine Lord, Shrikanth Narayanan

Code Available — Be the first to reproduce this paper.

Code

github.com/usc-sail/child-adult-diarization
Officialpytorch★ 18

Abstract

Automating child speech analysis is crucial for applications such as neurocognitive assessments. Speaker diarization, which identifies ``who spoke when'', is an essential component of the automated analysis. However, publicly available child-adult speaker diarization solutions are scarce due to privacy concerns and a lack of annotated datasets, while manually annotating data for each scenario is both time-consuming and costly. To overcome these challenges, we propose a data-efficient solution by creating simulated child-adult conversations using AudioSet. We then train a Whisper Encoder-based model, achieving strong zero-shot performance on child-adult speaker diarization using real datasets. The model performance improves substantially when fine-tuned with only 30 minutes of real train data, with LoRA further improving the transfer learning performance. The source code and the child-adult speaker diarization model trained on simulated conversations are publicly available.

Tasks

speaker-diarization Speaker Diarization Transfer Learning

Data Efficient Child-Adult Speaker Diarization with Simulated Conversations

Code

Abstract

Tasks

Reproductions