Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

2024-02-22Code Available1· sign in to hype

Johnathan Xie, Yoonho Lee, Annie S. Chen, Chelsea Finn

Code Available — Be the first to reproduce this paper.

Code

github.com/johnathan-xie/sma
OfficialIn paperjax★ 11

Abstract

Self-supervised learning excels in learning representations from large amounts of unlabeled data, demonstrating success across multiple data modalities. Yet, extending self-supervised learning to new modalities is non-trivial because the specifics of existing methods are tailored to each domain, such as domain-specific augmentations which reflect the invariances in the target task. While masked modeling is promising as a domain-agnostic framework for self-supervised learning because it does not rely on input augmentations, its mask sampling procedure remains domain-specific. We present Self-guided Masked Autoencoders (SMA), a fully domain-agnostic masked modeling method. SMA trains an attention based model using a masked modeling objective, by learning masks to sample without any domain-specific assumptions. We evaluate SMA on three self-supervised learning benchmarks in protein biology, chemical property prediction, and particle physics. We find SMA is capable of learning representations without domain-specific knowledge and achieves state-of-the-art performance on these three benchmarks.

Tasks

Molecular Property Prediction Property Prediction Self-Supervised Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
BACE	SMA	ROC-AUC	84.3	—	Unverified
BBBP	SMA	ROC-AUC	75	—	Unverified
ESOL	SMA	RMSE	0.62	—	Unverified
FreeSolv	SMA	RMSE	1.09	—	Unverified
HIV dataset	SMA	AUC	0.79	—	Unverified
Lipophilicity	SMA	RMSE	0.61	—	Unverified

Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

Code

Abstract

Tasks

Benchmark Results

Reproductions