Elastic deep autoencoder for text embedding clustering by an improved graph regularization

2023-09-23Expert System with application journal 2023Unverified0· sign in to hype

Fatemeh Daneshfar, Sayvan Soleymanbaigi, Ali Nafisi, Pedram Yamini

Unverified — Be the first to reproduce this paper.

Abstract

Text clustering is a task for grouping extracted information of the text in different clusters, which has many applications in recommender systems, sentiment analysis, and more. Deep learning-based methods have become increasingly popular due to their high accuracy in identifying nonlinear structures. They usually consist of two major parts: dimensionality reduction and clustering. Autoencoders are simple unsupervised neural networks used for better representation of low-dimensional data and have shown good performance in dealing with non-linear features. However, while they utilize the Frobenius norm to deal well with Gaussian noise, they are sensitive to outlier data and Laplacian noise. In this paper, a deep autoencoder with an adapted elastic loss for text embedding clustering (EDA-TEC) is proposed. The elastic loss is a combination of the Frobenius norm and L2,1-norm to consider both types of noises. Additionally, to maintain the high-dimensional data geometric structure, a modified graph regularization term based on the weighted cosine similarity measure is used. EDA-TEC also improves clustering results by considering the sparsity regularization of the manifold representation data. In this jointly end-to-end deep learning model, better representation and text clustering results are achieved with high accuracy on common datasets compared to existing methods.

Tasks

Clustering Dimensionality Reduction Recommendation Systems Sentiment Analysis Text Clustering

Elastic deep autoencoder for text embedding clustering by an improved graph regularization

Abstract

Tasks

Reproductions