SOTAVerified

Speaker Identification

Papers

Showing 51100 of 248 papers

TitleStatusHype
Privacy-preserving Representation Learning for Speech Understanding0
Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition0
End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis0
InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language ModelsCode1
Test-Time Training for Speech0
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks0
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction0
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker IdentificationCode0
Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility AssessmentCode0
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset0
VoxWatch: An open-set speaker recognition benchmark on VoxCeleb0
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech SignalsCode1
Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition0
Few-Shot Speaker Identification Using Lightweight Prototypical Network with Feature Grouping and Interaction0
MPCHAT: Towards Multimodal Persona-Grounded ConversationCode1
Ordered and Binary Speaker Embedding0
On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications0
GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation UnderstandingCode1
Security and Privacy Problems in Voice Assistant Applications: A Survey0
Unsupervised Speech Representation Pooling Using Vector QuantizationCode0
HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones0
Ensemble knowledge distillation of self-supervised speech models0
ExARN: self-attending RNN for target speaker extraction0
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event ClassificationCode1
MelHuBERT: A simplified HuBERT on Mel spectrogramsCode1
Multi-Label Training for Text-Independent Speaker Identification0
Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples0
Symmetric Saliency-based Adversarial Attack To Speaker Identification0
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the InputCode0
Speaker Identification from emotional and noisy speech data using learned voice segregation and Speech VGG0
Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation0
Cross-Lingual Speaker Identification Using Distant SupervisionCode0
Text Independent Speaker Identification System for Access Control0
Computing with Hypervectors for Efficient Speaker Identification0
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languagesCode1
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models0
Masked Autoencoders that ListenCode1
Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification0
End-to-End Chinese Speaker IdentificationCode1
Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones0
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre0
Extended U-Net for Speaker Verification in Noisy EnvironmentsCode1
Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems0
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations0
Speaker Identification using Speech Recognition0
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information0
VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution0
EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and IdentificationCode0
ATST: Audio Representation Learning with Teacher-Student TransformerCode1
Show:102550
← PrevPage 2 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MSM-MAETop-1 (%)96.6Unverified
2M2D/0.6Top-1 (%)96.5Unverified
3M2D/0.7Top-1 (%)96.3Unverified
4M2D ratio=0.6Top-1 (%)94.8Unverified
5AudioMAE (local)Top-1 (%)94.8Unverified
6ATST Base (ours)Top-1 (%)94.3Unverified
7AudioMAE (global)Top-1 (%)94.1Unverified
8AutoSpeech (N=8,C=128)Top-1 (%)87.66Unverified
9SSAST-FRAMETop-1 (%)80.8Unverified
10SSAMBATop-1 (%)70.1Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)67.77Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)80.83Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)95.13Unverified