SOTAVerified

Speaker Identification

Papers

Showing 51100 of 248 papers

TitleStatusHype
HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification0
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio0
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues0
From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification0
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization0
A Preliminary Exploration with GPT-4o Voice Mode0
SCDiar: a streaming diarization system based on speaker change detection and speech recognition0
Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models0
PolInterviews -- A Dataset of German Politician Public Broadcast Interviews0
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation UnderstandingCode0
Machine Unlearning reveals that the Gender-based Violence Victim Condition can be detected from Speech in a Speaker-Agnostic Setting0
Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural NetworkCode0
Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications0
Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments0
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization0
Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample0
How Redundant Is the Transformer Stack in Speech Representation Models?0
A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR0
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken DialogueCode0
Progressive Residual Extraction based Pre-training for Speech Representation Learning0
Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance EvaluationCode0
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language ModelsCode0
DASB -- Discrete Audio and Speech Benchmark0
Evaluating Speaker Identity Coding in Self-supervised Models and Humans0
TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches0
Masked Modeling Duo: Towards a Universal Audio Pre-training FrameworkCode0
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling0
Hearing-Loss Compensation Using Deep Neural Networks: A Framework and Results From a Listening Test0
A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement0
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification0
Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods0
Significance of Chirp MFCC as a Feature in Speech and Audio Applications0
Probing Self-supervised Learning Models with Target Speech Extraction0
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis0
Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models0
SIG: Speaker Identification in Literature via Prompt-Based GenerationCode0
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices0
Efficiency-oriented approaches for self-supervised speech representation learning0
Privacy-preserving Representation Learning for Speech Understanding0
Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition0
End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis0
Test-Time Training for Speech0
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks0
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction0
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker IdentificationCode0
Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility AssessmentCode0
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset0
VoxWatch: An open-set speaker recognition benchmark on VoxCeleb0
Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition0
Few-Shot Speaker Identification Using Lightweight Prototypical Network with Feature Grouping and Interaction0
Show:102550
← PrevPage 2 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MSM-MAETop-1 (%)96.6Unverified
2M2D/0.6Top-1 (%)96.5Unverified
3M2D/0.7Top-1 (%)96.3Unverified
4M2D ratio=0.6Top-1 (%)94.8Unverified
5AudioMAE (local)Top-1 (%)94.8Unverified
6ATST Base (ours)Top-1 (%)94.3Unverified
7AudioMAE (global)Top-1 (%)94.1Unverified
8AutoSpeech (N=8,C=128)Top-1 (%)87.66Unverified
9SSAST-FRAMETop-1 (%)80.8Unverified
10SSAMBATop-1 (%)70.1Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)67.77Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)80.83Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)95.13Unverified