QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings

2017-08-01SEMEVAL 2017Unverified0· sign in to hype

Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, Shuwang Han

Unverified — Be the first to reproduce this paper.

Abstract

This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.

Tasks

Lemmatization Semantic Textual Similarity Word Embeddings

QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings

Abstract

Tasks

Reproductions