SOTAVerified

Eigencharacter: An Embedding of Chinese Character Orthography

2019-11-01WS 2019Unverified0· sign in to hype

Yu-Hsiang Tseng, Shu-Kai Hsieh

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Chinese characters are unique in its logographic nature, which inherently encodes world knowledge through thousands of years evolution. This paper proposes an embedding approach, namely eigencharacter (EC) space, which helps NLP application easily access the knowledge encoded in Chinese orthography. These EC representations are automatically extracted, encode both structural and radical information, and easily integrate with other computational models. We built EC representations of 5,000 Chinese characters, investigated orthography knowledge encoded in ECs, and demonstrated how these ECs identified visually similar characters with both structural and radical information.

Tasks

Reproductions