SOTAVerified

Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Showing 176200 of 328 papers

TitleStatusHype
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset0
Generation of Speaker Representations Using Heterogeneous Training Batch Assembly0
Streaming Speaker-Attributed ASR with Token-Level Speaker EmbeddingsCode1
Multi-scale Speaker Diarization with Dynamic Scale Weighting0
Using Active Speaker Faces for Diarization in TV shows0
Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries0
Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 FeaturesCode0
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios0
Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model0
The xmuspeech system for multi-channel multi-party meeting transcription challenge0
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge0
Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge0
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge0
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge0
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge0
AVA-AVD: Audio-Visual Speaker Diarization in the WildCode1
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual InformationCode0
Low-Latency Online Speaker Diarization with Graph-Based Label Generation0
Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization0
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
Multi-Channel End-to-End Neural Diarization with Distributed Microphones0
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global contextCode1
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR0
North America Bixby Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 20210
Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn DetectionCode1
Show:102550
← PrevPage 8 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1COS+NJW-SC (Oracle SAD)DER(%)24.05Unverified
2EENDDER(%)23.07Unverified
3COS+AHC (Oracle SAD)DER(%)21.13Unverified
4SA-EEND (2-spk, no-adapt)DER(%)12.66Unverified
5EEND-OLADER(%)12.57Unverified
6SA-EEND (2-spk, adapted)DER(%)10.76Unverified
7TOLDDER(%)10.14Unverified
8COS+B-SC (Oracle SAD)DER(ig olp)8.78Unverified
9PLDA+AHC (Oracle SAD)DER(ig olp)8.39Unverified
10COS+NME-SC (Oracle SAD)DER(ig olp)7.29Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)8.39Unverified
2TitaNet-L (NME-SC)DER(%)6.73Unverified
3TitaNet-M (NME-SC)DER(%)6.47Unverified
4TitaNet-S (NME-SC)DER(%)6.37Unverified
5x-vector (MCGAN)DER(%)5.73Unverified
#ModelMetricClaimedVerifiedStatus
1ECAPA (SC)DER(%)2.36Unverified
2TitaNet-L (NME-SC)DER(%)2.03Unverified
3TitaNet-S (NME-SC)DER(%)2Unverified
4TitaNet-M (NME-SC)DER(%)1.99Unverified
#ModelMetricClaimedVerifiedStatus
1TitaNet-S (NME-SC)DER(%)2.22Unverified
2TitaNet-M (NME-SC)DER(%)1.79Unverified
3ECAPA (SC)DER(%)1.78Unverified
4TitaNet-L (NME-SC)DER(%)1.73Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)9.72Unverified
2TitaNet-L (NME-SC)DER(%)1.19Unverified
3TitaNet-M (NME-SC)DER(%)1.13Unverified
4TitaNet-S (NME-SC)DER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (the best result in the literature as of Oct.2019)DER(%)11.2Unverified
2pyannote (MFCC)DER(%)10.5Unverified
3pyannote (waveform)DER(%)9.9Unverified
#ModelMetricClaimedVerifiedStatus
1BaselineDER(%)7.7Unverified
2pyannote (MFCC)DER(%)5.6Unverified
3pyannote (waveform)DER(%)4.9Unverified
#ModelMetricClaimedVerifiedStatus
1pyannote (MFCC)DER(%)6.3Unverified
2pyannote (waveform)DER(%)6Unverified
#ModelMetricClaimedVerifiedStatus
1d-vector + spectralDER(%)12.54Unverified
2titanet-sDER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1SONDDER(%)4.46Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNN-SMLDER(%)27.3Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNNV10.6Unverified