Universal Sentence Encoder

2018-03-29Code Available1· sign in to hype

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/ncbi-nlp/BioSentVec
none★ 611
github.com/ppapalampidi/SUMMER
pytorch★ 41
github.com/ppapalampidi/GraphTP
pytorch★ 30
github.com/asgaardlab/test-case-similarity-technique
tf★ 5
github.com/f-data/ADD
tf★ 0
github.com/Alleansa/eluvio
tf★ 0
github.com/ncbi-nlp/BioWordVec
none★ 0
github.com/idanmoradarthas/text-summarization
tf★ 0
github.com/martinomensio/spacy-universal-sentence-encoder
tf★ 0
github.com/joseph-bongo-220/TV_NLP_Project
tf★ 0

Abstract

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

Tasks

Conversational Response Selection Semantic Textual Similarity Sentence Sentence Embeddings Sentiment Analysis Subjectivity Analysis Text Classification Transfer Learning Word Embeddings

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
PolyAI Reddit	USE	1-of-100 Accuracy	47.7	—	Unverified

Universal Sentence Encoder

Code

Abstract

Tasks

Benchmark Results

Reproductions