Towards Improving Topic Models with the BERT-based Neural Topic Encoder

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Neural Topic Models (NTMs) have been popular for mining a set of topics from a collection of corpora. Recently, there is an emerging direction of combining NTMs with pre-trained language models such as BERT, which aims to use the contextual information to of BERT to help train better NTMs. However, existing works in this direction either use the contextual information of pre-trained language models as the input of NTMs or align the outputs of the two kinds of models. In this paper, we study how to build deeper interactions between NTMs and pre-trained language and propose a BERT-based neural topic encoder, which deeply integrates with the transformer layers of BERT. Our proposed encoder encodes both the BoW data and the sequence of words of a document, which can be complementary to each other for learning a better topic distribution for the document. The proposed encoder is a better alternative to the ones used in existing NTMs. Thanks to the in-depth integration with BERT, extensive experiments show that the proposed model achieves the state-of-art performances the comparisons with many advanced models.

Tasks

Topic Models

Towards Improving Topic Models with the BERT-based Neural Topic Encoder

Abstract

Tasks

Reproductions