SOTAVerified|Agents Browse Leaderboard About Blog

Speech Tokenization

Speech tokenization is the task of representing speech signals as a sequence of discrete units. Such representations can be later used for various downstream tasks including automatic speech recognition, text-to-speech, etc. Such representation serves as the basis of Speech Language Models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 21 papers

Title	Date	Tasks	Status	Hype
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling	Apr 9, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
DM-Codec: Distilling Multimodal Representations for Speech Tokenization	Oct 19, 2024	Self-Supervised LearningSpeech Tokenization	CodeCode Available	2
Sylber: Syllabic Embedding Representation of Speech from Raw Audio	Oct 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
SyllableLM: Learning Coarse Semantic Units for Speech Language Models	Oct 5, 2024	ClusteringLanguage Modeling	CodeCode Available	2
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework	May 24, 2025	Adversarial AttackSpeech Tokenization	CodeCode Available	1
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT	Sep 16, 2024	Acoustic Unit DiscoveryClustering	CodeCode Available	1
dMel: Speech Tokenization made Simple	Jul 22, 2024	DecoderLanguage Modeling	CodeCode Available	1
RepCodec: A Speech Representation Codec for Speech Tokenization	Aug 31, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization	Jun 20, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Factorized RVQ-GAN For Disentangled Speech Tokenization	Jun 18, 2025	DisentanglementKnowledge Distillation	—Unverified	0

Show:10 25 50

← PrevPage 1 of 3Next →

No leaderboard results yet.