SOTAVerified

Reshape Dimensions Network for Speaker Recognition

2024-07-25Code Available2· sign in to hype

Ivan Yakovlev, Rostislav Makarov, Andrei Balykin, Pavel Malov, Anton Okhotnikov, Nikita Torgashov

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this paper, we present Reshape Dimensions Network (ReDimNet), a novel neural network architecture for extracting utterance-level speaker representations. Our approach leverages dimensionality reshaping of 2D feature maps to 1D signal representation and vice versa, enabling the joint usage of 1D and 2D blocks. We propose an original network topology that preserves the volume of channel-timestep-frequency outputs of 1D and 2D blocks, facilitating efficient residual feature maps aggregation. Moreover, ReDimNet is efficiently scalable, and we introduce a range of model sizes, varying from 1 to 15 M parameters and from 0.5 to 20 GMACs. Our experimental results demonstrate that ReDimNet achieves state-of-the-art performance in speaker recognition while reducing computational complexity and the number of model parameters.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
VoxCelebReDimNet-B6-SF2-LM-ASNorm (15.0M)EER0.37Unverified
VoxCelebReDimNet-B5-SF2-LM-ASNorm (9.2M)EER0.39Unverified
VoxCelebReDimNet-B6-SF2-LM (15.0M)EER0.4Unverified
VoxCelebReDimNet-B5-SF2-LM (9.2M)EER0.43Unverified
VoxCelebReDimNet-B4-LM-ASNorm (6.3M)EER0.44Unverified
VoxCelebReDimNet-B3-LM-ASNorm (3.0M)EER0.47Unverified
VoxCelebReDimNet-B3-LM (3.0M)EER0.5Unverified
VoxCelebReDimNet-B4-LM (6.3M)EER0.51Unverified
VoxCelebReDimNet-B2-SF2-LM-ASNorm (4.7M)EER0.52Unverified
VoxCelebReDimNet-B2-SF2-LM (4.7M)EER0.57Unverified
VoxCelebReDimNet-B1-LM-ASNorm (2.2M)EER0.73Unverified
VoxCelebReDimNet-B1-LM (2.2M)EER0.85Unverified
VoxCelebReDimNet-B0-LM-ASNorm (1.0M)EER1.07Unverified
VoxCelebReDimNet-B0-LM (1.0M)EER1.16Unverified
VoxCeleb1ReDimNet-B6-SF2-LM-ASNorm (15.0M)EER0.37Unverified
VoxCeleb1ReDimNet-B5-SF2-LM-ASNorm (9.2M)EER0.39Unverified
VoxCeleb1ReDimNet-B6-SF2-LM (15.0M)EER0.4Unverified
VoxCeleb1ReDimNet-B5-SF2-LM (9.2M)EER0.43Unverified
VoxCeleb1ReDimNet-B4-LM-ASNorm (6.3M)EER0.44Unverified
VoxCeleb1ReDimNet-B3-LM-ASNorm (3.0M)EER0.47Unverified
VoxCeleb1ReDimNet-B3-LM (3.0M)EER0.5Unverified
VoxCeleb1ReDimNet-B4-LM (6.3M)EER0.51Unverified
VoxCeleb1ReDimNet-B2-SF2-LM-ASNorm (4.7M)EER0.52Unverified
VoxCeleb1ReDimNet-B2-SF2-LM (4.7M)EER0.57Unverified
VoxCeleb1ReDimNet-B1-LM-ASNorm (2.2M)EER0.73Unverified
VoxCeleb1ReDimNet-B1-LM (2.2M)EER0.85Unverified
VoxCeleb1ReDimNet-B0-LM-ASNorm (1.0M)EER1.07Unverified
VoxCeleb1ReDimNet-B0-LM (1.0M)EER1.16Unverified

Reproductions