SOTAVerified

Speech Tokenization

Speech tokenization is the task of representing speech signals as a sequence of discrete units. Such representations can be later used for various downstream tasks including automatic speech recognition, text-to-speech, etc. Such representation serves as the basis of Speech Language Models.

Papers

Showing 121 of 21 papers

TitleStatusHype
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization0
Factorized RVQ-GAN For Disentangled Speech Tokenization0
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box FrameworkCode1
Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models0
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English0
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language ModelingCode2
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation0
Recent Advances in Discrete Speech Tokens: A Review0
BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term DetectionCode0
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models0
DM-Codec: Distilling Multimodal Representations for Speech TokenizationCode2
Sylber: Syllabic Embedding Representation of Speech from Raw AudioCode2
SyllableLM: Learning Coarse Semantic Units for Speech Language ModelsCode2
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERTCode1
LAST: Language Model Aware Speech Tokenization0
STAB: Speech Tokenizer Assessment Benchmark0
dMel: Speech Tokenization made SimpleCode1
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing0
Scaling Properties of Speech Language Models0
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data0
RepCodec: A Speech Representation Codec for Speech TokenizationCode1
Show:102550

No leaderboard results yet.