SOTAVerified

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

2017-01-16Unverified0· sign in to hype

Azarbonyad Hosein, Dehghani Mostafa, Kenter Tom, Marx Maarten, Kamps Jaap, de Rijke Maarten

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.

Tasks

Reproductions