pyannote.audio: neural building blocks for speaker diarization

2019-11-04Code Available3· sign in to hype

Hervé Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, Marie-Philippe Gill

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/MarvinLvn/voice-type-classifier
none★ 50
github.com/muskang48/Speaker-Diarization
tf★ 0

Abstract

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.

Tasks

Action Detection Activity Detection BIG-bench Machine Learning Change Detection speaker-diarization Speaker Diarization

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
AMI	pyannote (waveform)	DER(%)	6	—	Unverified
AMI	pyannote (MFCC)	DER(%)	6.3	—	Unverified
DIHARD	pyannote (MFCC)	DER(%)	10.5	—	Unverified
DIHARD	pyannote (waveform)	DER(%)	9.9	—	Unverified
DIHARD	Baseline (the best result in the literature as of Oct.2019)	DER(%)	11.2	—	Unverified
ETAPE	pyannote (MFCC)	DER(%)	5.6	—	Unverified
ETAPE	Baseline	DER(%)	7.7	—	Unverified
ETAPE	pyannote (waveform)	DER(%)	4.9	—	Unverified

pyannote.audio: neural building blocks for speaker diarization

Code

Abstract

Tasks

Benchmark Results

Reproductions