Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

2023-11-07Code Available1· sign in to hype

Sihao Chen, Hongming Zhang, Tong Chen, Ben Zhou, Wenhao Yu, Dian Yu, Baolin Peng, Hongwei Wang, Dan Roth, Dong Yu

Code Available — Be the first to reproduce this paper.

Code

github.com/schen149/sub-sentence-encoder
OfficialIn paperpytorch★ 85

Abstract

We introduce sub-sentence encoder, a contrastively-learned contextual embedding model for fine-grained semantic representation of text. In contrast to the standard practice with sentence embeddings, where the meaning of an entire sequence of text is encoded into a fixed-length vector, the sub-sentence encoder learns to produce distinct contextual embeddings corresponding to different atomic propositions, i.e. atomic units of meaning expressed within a text sequence. The sub-sentence embeddings are contrastively learned to recognize (inferred) semantic equivalence between propositions across different text sequences. Our experiments show the effectiveness of sub-sentence encoders in applications, such as retrieving supporting facts for fine-grained text attribution or recognizing the conditional semantic similarity between texts. In practice, we demonstrate that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.

Tasks

Contrastive Learning Semantic Similarity Semantic Textual Similarity Sentence Sentence Embeddings

Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

Code

Abstract

Tasks

Reproductions