Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

2025-05-17Code Available0· sign in to hype

Zhiheng Chen, Ruofan Wu, Guanhua Fang

Code Available — Be the first to reproduce this paper.

Code

github.com/rorschach1989/transformer-for-gmm
OfficialIn paperpytorch★ 3

Abstract

The transformer architecture has demonstrated remarkable capabilities in modern artificial intelligence, among which the capability of implicitly learning an internal model during inference time is widely believed to play a key role in the under standing of pre-trained large language models. However, most recent works have been focusing on studying supervised learning topics such as in-context learning, leaving the field of unsupervised learning largely unexplored. This paper investigates the capabilities of transformers in solving Gaussian Mixture Models (GMMs), a fundamental unsupervised learning problem through the lens of statistical estimation. We propose a transformer-based learning framework called TGMM that simultaneously learns to solve multiple GMM tasks using a shared transformer backbone. The learned models are empirically demonstrated to effectively mitigate the limitations of classical methods such as Expectation-Maximization (EM) or spectral algorithms, at the same time exhibit reasonable robustness to distribution shifts. Theoretically, we prove that transformers can approximate both the EM algorithm and a core component of spectral methods (cubic tensor power iterations). These results bridge the gap between practical success and theoretical understanding, positioning transformers as versatile tools for unsupervised learning.

Tasks

In-Context Learning

Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

Code

Abstract

Tasks

Reproductions