SOTAVerified

Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Showing 101150 of 328 papers

TitleStatusHype
psifx -- Psychological and Social Interactions Feature Extraction Package0
A Benchmark for Multi-speaker Anonymization0
Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency0
The USTC-NERCSLIP Systems for The ICMC-ASR Challenge0
Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders0
Audio-Visual Approach For Multimodal Concurrent Speaker Detection0
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios0
From Modular to End-to-End Speaker Diarization0
Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization0
AG-LSEC: Audio Grounded Lexical Speaker Error Correction0
Investigating Confidence Estimation Measures for Speaker Diarization0
A Review of Common Online Speaker Diarization Methods0
System Description for the Displace Speaker Diarization Challenge 20230
Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control0
The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences0
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech0
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments0
Neural Blind Source Separation and Diarization for Distant Speech Recognition0
Target Speech Diarization with Multimodal Prompts0
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings0
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization0
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification0
Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning0
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and DiarizationCode0
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization0
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications0
Listening to Multi-talker Conversations: Modular and End-to-end Perspectives0
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection0
The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models0
Spatial-Temporal Activity-Informed Diarization and Separation0
End-to-End Supervised Hierarchical Graph Clustering for Speaker DiarizationCode0
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription0
Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization0
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech RepresentationCode0
Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification0
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition0
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings0
Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization0
Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and LAnguage in Conversational Environments0
UniX-Encoder: A Universal X-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing0
EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks0
Powerset multi-class cross entropy loss for neural speaker diarizationCode0
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System0
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation0
End-to-end Online Speaker Diarization with Target Speaker Tracking0
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition0
NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization0
Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation0
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network0
Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)0
Show:102550
← PrevPage 3 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1COS+NJW-SC (Oracle SAD)DER(%)24.05Unverified
2EENDDER(%)23.07Unverified
3COS+AHC (Oracle SAD)DER(%)21.13Unverified
4SA-EEND (2-spk, no-adapt)DER(%)12.66Unverified
5EEND-OLADER(%)12.57Unverified
6SA-EEND (2-spk, adapted)DER(%)10.76Unverified
7TOLDDER(%)10.14Unverified
8COS+B-SC (Oracle SAD)DER(ig olp)8.78Unverified
9PLDA+AHC (Oracle SAD)DER(ig olp)8.39Unverified
10COS+NME-SC (Oracle SAD)DER(ig olp)7.29Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)8.39Unverified
2TitaNet-L (NME-SC)DER(%)6.73Unverified
3TitaNet-M (NME-SC)DER(%)6.47Unverified
4TitaNet-S (NME-SC)DER(%)6.37Unverified
5x-vector (MCGAN)DER(%)5.73Unverified
#ModelMetricClaimedVerifiedStatus
1ECAPA (SC)DER(%)2.36Unverified
2TitaNet-L (NME-SC)DER(%)2.03Unverified
3TitaNet-S (NME-SC)DER(%)2Unverified
4TitaNet-M (NME-SC)DER(%)1.99Unverified
#ModelMetricClaimedVerifiedStatus
1TitaNet-S (NME-SC)DER(%)2.22Unverified
2TitaNet-M (NME-SC)DER(%)1.79Unverified
3ECAPA (SC)DER(%)1.78Unverified
4TitaNet-L (NME-SC)DER(%)1.73Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)9.72Unverified
2TitaNet-L (NME-SC)DER(%)1.19Unverified
3TitaNet-M (NME-SC)DER(%)1.13Unverified
4TitaNet-S (NME-SC)DER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (the best result in the literature as of Oct.2019)DER(%)11.2Unverified
2pyannote (MFCC)DER(%)10.5Unverified
3pyannote (waveform)DER(%)9.9Unverified
#ModelMetricClaimedVerifiedStatus
1BaselineDER(%)7.7Unverified
2pyannote (MFCC)DER(%)5.6Unverified
3pyannote (waveform)DER(%)4.9Unverified
#ModelMetricClaimedVerifiedStatus
1pyannote (MFCC)DER(%)6.3Unverified
2pyannote (waveform)DER(%)6Unverified
#ModelMetricClaimedVerifiedStatus
1d-vector + spectralDER(%)12.54Unverified
2titanet-sDER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1SONDDER(%)4.46Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNN-SMLDER(%)27.3Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNNV10.6Unverified