SOTAVerified

Speaker Recognition

Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.

Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Papers

Showing 150 of 435 papers

TitleStatusHype
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS0
A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments0
CoLMbo: Speaker Language Model for Descriptive ProfilingCode0
Learning Speaker-Invariant Visual Features for Lipreading0
Rhythm Features for Speaker Identification0
Synthetic Speech Source Tracing using Metric Learning0
LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention0
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction0
Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations0
Pretraining Multi-Speaker Identification for Neural Speaker Diarization0
Private kNN-VC: Interpretable Anonymization of Converted SpeechCode0
SEED: Speaker Embedding Enhancement Diffusion ModelCode2
Analysis of ABC Frontend Audio Systems for the NIST-SRE240
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition0
From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification0
Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks0
Language Modelling for Speaker Diarization in Telephonic Interviews0
VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition0
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution0
Study on Inter and Intra Speaker Variability in Speaker Recognition0
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks0
Investigation of Speaker Representation for Target-Speaker Speech Processing0
The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities0
Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample0
Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection0
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models0
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels0
oboVox Far Field Speaker Recognition: A Novel Data Augmentation Approach with Pretrained Models0
Text-To-Speech Synthesis In The Wild0
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings0
The VoxCeleb Speaker Recognition Challenge: A Retrospective0
Convexity-based Pruning of Speech Representation Models0
Long-Term Conversation Analysis: Privacy-Utility Trade-off under Noise and Reverberation0
VoxSim: A perceptual voice similarity datasetCode1
Reshape Dimensions Network for Speaker RecognitionCode2
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning0
Team HYU ASML ROBOVOX SP Cup 2024 System Description0
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification BenchmarkCode5
Phonetic Richness for Improved Automatic Speaker Verification0
A voice and speech corpus of patients who underwent upper airway surgery in pre- and post-operative statesCode0
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation0
We Need Variations in Speech Generation: Sub-center Modelling for Speaker Embeddings0
Prosody-Driven Privacy-Preserving Dementia DetectionCode0
Open-Source Conversational AI with SpeechBrain 1.00
CEC: A Noisy Label Detection Method for Speaker Recognition0
Challenging margin-based speaker embedding extractors by using the variational information bottleneck0
PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation0
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection0
Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting0
Show:102550
← PrevPage 1 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1w2v2-aamEER1.88Unverified
2WavLM+ECAPA-TDNNEER0.39Unverified