Homonym normalisation by word sense clustering: a case in Japanese

2020-12-01COLING 2020Unverified0· sign in to hype

Yo Sato, Kevin Heffernan

Unverified — Be the first to reproduce this paper.

Abstract

This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing. It uses contextualised embeddings (BERT) to cluster tokens into distinct sense groups, and we use these groups to normalise synonymous instances to a single representative form. We see the benefit of this normalisation in language model, as well as in transliteration.

Tasks

Clustering Language Modeling Language Modelling Transliteration

Homonym normalisation by word sense clustering: a case in Japanese

Abstract

Tasks

Reproductions