SOTAVerified

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

2025-07-05Code Available0· sign in to hype

Nayeon Kim, Eojin Jeon, Jun-Hyung Park, SangKeun Lee

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this study, we introduce KOPL, a novel framework for handling Korean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and letters. KOPL incorporates phoneme and word representations for Korean OOV words, facilitating Korean OOV word representations to capture both text and phoneme information of words. We empirically demonstrate that KOPL significantly improves the performance on Korean Natural Language Processing (NLP) tasks, while being readily integrated into existing static and contextual Korean embedding models in a plug-and-play manner. Notably, we show that KOPL outperforms the state-of-the-art model by an average of 1.9%. Our code is available at https://github.com/jej127/KOPL.git.

Reproductions