On the Sentence Embeddings from Pre-trained Language Models

2020-11-02EMNLP 2020Code Available1· sign in to hype

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei LI

Code Available — Be the first to reproduce this paper.

Code

github.com/bohanli/BERT-flow
OfficialIn papertf★ 535
github.com/InsaneLife/dssm
tf★ 667
github.com/sleepthroughdifficulties/kernelwhitening
pytorch★ 5

Abstract

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.

Tasks

Language Modeling Language Modelling Semantic Similarity Semantic Textual Similarity Sentence Sentence Embedding Sentence-Embedding Sentence Embeddings

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
SICK	BERTbase-flow (NLI)	Spearman Correlation	0.65	—	Unverified
STS12	BERTlarge-flow (target)	Spearman Correlation	0.65	—	Unverified
STS13	BERTlarge-flow (target)	Spearman Correlation	0.73	—	Unverified
STS14	BERTlarge-flow (target)	Spearman Correlation	0.69	—	Unverified
STS15	BERTlarge-flow (target)	Spearman Correlation	0.75	—	Unverified
STS16	BERTlarge-flow (target)	Spearman Correlation	0.78	—	Unverified
STS Benchmark	BERTlarge-flow (target)	Spearman Correlation	0.72	—	Unverified

On the Sentence Embeddings from Pre-trained Language Models

Code

Abstract

Tasks

Benchmark Results

Reproductions