Distributed Document and Phrase Co-embeddings for Descriptive Clustering

2017-04-01EACL 2017Unverified0· sign in to hype

Motoki Sato, Austin J. Brockmeier, Georgios Kontonatsios, Tingting Mu, John Y. Goulermas, Jun{'}ichi Tsujii, Sophia Ananiadou

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a co-embedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.

Tasks

Clustering Descriptive Information Retrieval Semantic Textual Similarity

Distributed Document and Phrase Co-embeddings for Descriptive Clustering

Abstract

Tasks

Reproductions