SOTAVerified

Reliable Pseudo-labeling via Optimal Transport with Attention for Short Text Clustering

2025-01-25Code Available0· sign in to hype

Zhihao Yao, Jixuan Yin, Bo Li

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Short text clustering has gained significant attention in the data mining community. However, the limited valuable information contained in short texts often leads to low-discriminative representations, increasing the difficulty of clustering. This paper proposes a novel short text clustering framework, called Reliable Pseudo-labeling via Optimal Transport with Attention for Short Text Clustering (POTA), that generate reliable pseudo-labels to aid discriminative representation learning for clustering. Specially, POTA first implements an instance-level attention mechanism to capture the semantic relationships among samples, which are then incorporated as a semantic consistency regularization term into an optimal transport problem. By solving this OT problem, we can yield reliable pseudo-labels that simultaneously account for sample-to-sample semantic consistency and sample-to-cluster global structure information. Additionally, the proposed OT can adaptively estimate cluster distributions, making POTA well-suited for varying degrees of imbalanced datasets. Then, we utilize the pseudo-labels to guide contrastive learning to generate discriminative representations and achieve efficient clustering. Extensive experiments demonstrate POTA outperforms state-of-the-art methods. The code is available at: https://github.com/YZH0905/POTA-STC/tree/main.

Tasks

Reproductions