SOTAVerified

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

2023-10-18Code Available1· sign in to hype

Kari A Noriy, Xiaosong Yang, Marcin Budka, Jian Jun Zhang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Multilingual speech processing requires understanding emotions, a task made difficult by limited labelled data. CLARA, minimizes reliance on labelled data, enhancing generalization across languages. It excels at fostering shared representations, aiding cross-lingual transfer of speech and emotions, even with little data. Our approach adeptly captures emotional nuances in speech, overcoming subjective assessment issues. Using a large multilingual audio corpus and self-supervised learning, CLARA develops speech representations enriched with emotions, advancing emotion-aware multilingual speech processing. Our method expands the data range using data augmentation, textual embedding for visual understanding, and transfers knowledge from high- to low-resource languages. CLARA demonstrates excellent performance in emotion recognition, language comprehension, and audio benchmarks, excelling in zero-shot and few-shot learning. It adapts to low-resource languages, marking progress in multilingual speech representation learning.

Tasks

Reproductions