Looking Into the Black Box - How Are Idioms Processed in BERT?

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Idioms such as ``call it a day'' and ``piece of cake'' are frequent in natural language. How are they processed by language models such as BERT? This study investigates this question with two experiments: (1) an analysis of embedding similarities of idiomatic sentences and their literal Spelled-out counterparts across all layers, and (2) an analysis of the word embeddings when the word appears in an idiomatic versus literal context across all layers. Experiment 1 shows that the embeddings of an Idiom sentence and its Spelled-out counterpart become more similar across the layers. When compared to random controls, layer 8 is where the Spelled-out counterpart is ranked highest in embedding similarity. Experiment 2 shows that the embedding of single words in idiomatic versus literal contexts diverge and become the most different in layer 8. Overall, the study suggests that BERT ``understands'' idiomatic expressions even without context, and that it processes them more akin to a syntactic feature than purely a semantic one.

Tasks

Sentence Word Embeddings

Looking Into the Black Box - How Are Idioms Processed in BERT?

Abstract

Tasks

Reproductions