SOTAVerified

Sentence Selection Strategies for Distilling Word Embeddings from BERT

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Many applications crucially rely on the avail-ability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyze a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.

Tasks

Reproductions