On the Provable Advantage of Unsupervised Pretraining

2023-03-02Unverified0· sign in to hype

Jiawei Ge, Shange Tang, Jianqing Fan, Chi Jin

Unverified — Be the first to reproduce this paper.

Abstract

Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems. Despite its tremendous empirical success, the rigorous theoretical understanding of why unsupervised pretraining generally helps remains rather limited -- most existing results are restricted to particular methods or approaches for unsupervised pretraining with specialized structural assumptions. This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models and the downstream task is specified by a class of prediction functions . We consider a natural approach of using Maximum Likelihood Estimation (MLE) for unsupervised pretraining and Empirical Risk Minimization (ERM) for learning downstream tasks. We prove that, under a mild ''informative'' condition, our algorithm achieves an excess risk of O(C_/m + C_/n) for downstream tasks, where C_, C_ are complexity measures of function classes , , and m, n are the number of unlabeled and labeled data respectively. Comparing to the baseline of O(C_ /n) achieved by performing supervised learning using only the labeled data, our result rigorously shows the benefit of unsupervised pretraining when m n and C_ > C_. This paper further shows that our generic framework covers a wide range of approaches for unsupervised pretraining, including factor models, Gaussian mixture models, and contrastive learning.

Tasks

Contrastive Learning Representation Learning

On the Provable Advantage of Unsupervised Pretraining

Abstract

Tasks

Reproductions