SOTAVerified

Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Showing 51100 of 328 papers

TitleStatusHype
From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural DiarizationCode1
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation AlgorithmCode1
Low-Latency Speech Separation Guided Diarization for Telephone ConversationsCode1
Phoneme Boundary Detection using Learnable Segmental FeaturesCode1
Streaming Speaker-Attributed ASR with Token-Level Speaker EmbeddingsCode1
Encoder-Decoder Based Attractors for End-to-End Neural DiarizationCode1
All-neural online source separation, counting, and diarization for meeting analysis0
Constrained speaker diarization of TV series based on visual patterns0
Computer-assisted Speaker Diarization: How to Evaluate Human Corrections0
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization0
A sticky HDP-HMM with application to speaker diarization0
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge0
Guided Speaker Embedding0
Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding0
Compositional Embeddings: Joint Perception and Comparison of Class Label Sets0
ASR Error Correction and Domain Adaptation Using Machine Translation0
Compositional Embeddings for Multi-Label One-Shot Learning0
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings0
Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)0
GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition Challenge (VoxSRC) 20230
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification0
Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization0
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification.0
Chronological Self-Training for Real-Time Speaker Diarization0
CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings0
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings0
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection0
BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers0
A Review of Speaker Diarization: Recent Advances with Deep Learning0
Exploring Speaker Diarization with Mixture of Experts0
Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization0
A Review of Common Online Speaker Diarization Methods0
AG-LSEC: Audio Grounded Lexical Speaker Error Correction0
Generation of Speaker Representations Using Heterogeneous Training Batch Assembly0
Home monitoring for frailty detection through sound and speaker diarization analysis0
End-to-End Speaker Diarization as Post-Processing0
A Reinforcement Learning Framework for Online Speaker Diarization0
End-to-end Online Speaker Diarization with Target Speaker Tracking0
Bazinga! A Dataset for Multi-Party Dialogues Structuring0
A Real-time Speaker Diarization System Based on Spatial Spectrum0
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond0
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection0
EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks0
EML System Description for VoxCeleb Speaker Diarization Challenge 20200
Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis0
Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization0
基於i-vector與PLDA並使用GMM-HMM強制對位之自動語者分段標記系統 (Speaker Diarization based on I-vector PLDA Scoring and using GMM-HMM Forced Alignment) [In Chinese]0
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization0
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings0
A framework for the automatic inference of stochastic turn-taking styles0
Show:102550
← PrevPage 2 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1COS+NJW-SC (Oracle SAD)DER(%)24.05Unverified
2EENDDER(%)23.07Unverified
3COS+AHC (Oracle SAD)DER(%)21.13Unverified
4SA-EEND (2-spk, no-adapt)DER(%)12.66Unverified
5EEND-OLADER(%)12.57Unverified
6SA-EEND (2-spk, adapted)DER(%)10.76Unverified
7TOLDDER(%)10.14Unverified
8COS+B-SC (Oracle SAD)DER(ig olp)8.78Unverified
9PLDA+AHC (Oracle SAD)DER(ig olp)8.39Unverified
10COS+NME-SC (Oracle SAD)DER(ig olp)7.29Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)8.39Unverified
2TitaNet-L (NME-SC)DER(%)6.73Unverified
3TitaNet-M (NME-SC)DER(%)6.47Unverified
4TitaNet-S (NME-SC)DER(%)6.37Unverified
5x-vector (MCGAN)DER(%)5.73Unverified
#ModelMetricClaimedVerifiedStatus
1ECAPA (SC)DER(%)2.36Unverified
2TitaNet-L (NME-SC)DER(%)2.03Unverified
3TitaNet-S (NME-SC)DER(%)2Unverified
4TitaNet-M (NME-SC)DER(%)1.99Unverified
#ModelMetricClaimedVerifiedStatus
1TitaNet-S (NME-SC)DER(%)2.22Unverified
2TitaNet-M (NME-SC)DER(%)1.79Unverified
3ECAPA (SC)DER(%)1.78Unverified
4TitaNet-L (NME-SC)DER(%)1.73Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)9.72Unverified
2TitaNet-L (NME-SC)DER(%)1.19Unverified
3TitaNet-M (NME-SC)DER(%)1.13Unverified
4TitaNet-S (NME-SC)DER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (the best result in the literature as of Oct.2019)DER(%)11.2Unverified
2pyannote (MFCC)DER(%)10.5Unverified
3pyannote (waveform)DER(%)9.9Unverified
#ModelMetricClaimedVerifiedStatus
1BaselineDER(%)7.7Unverified
2pyannote (MFCC)DER(%)5.6Unverified
3pyannote (waveform)DER(%)4.9Unverified
#ModelMetricClaimedVerifiedStatus
1pyannote (MFCC)DER(%)6.3Unverified
2pyannote (waveform)DER(%)6Unverified
#ModelMetricClaimedVerifiedStatus
1d-vector + spectralDER(%)12.54Unverified
2titanet-sDER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1SONDDER(%)4.46Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNN-SMLDER(%)27.3Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNNV10.6Unverified