Encoding Category Trees Into Word-Embeddings Using Geometric Approach
Tiansi Dong, Olaf Cremers, Hailong Jin, Juanzi Li, Chrisitan Bauckhage, Armin B. Cremers, Daniel Speicher, Joerg Zimmermann
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/gnodisnait/bp94nballOfficialIn papernone★ 4
- github.com/gnodisnait/nball4treeOfficialIn papernone★ 0
Abstract
We present a novel method to implicitly encode a tree-structured category information into word-embeddings, resulting in super-dimensional ball representations (n-ball embedding for short). Inclusion relations among n-balls precisely encode subordinate relations among categories. The cosine similarity function is enriched by category information. A large n-ball dataset is constructed using geometrical method, which achieves zero energy cost in embedding tree structures into word embedding. A new benchmark dataset is created for predicting the category of unknown words. Experiments show that n-ball embeddings, carried with category information, significantly out-perform word-embeddings in the neighbourhood test, while only slightly change the original word-embeddings. Experiment results also show that n-ball embeddings demonstrate surprisingly good performance in validating the category of unknown word. Source codes and data-sets are free for public access https://github.com/gnodisnait/nball4tree.git and https://github.com/gnodisnait/bp94nball.git.