ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

2021-06-11Unverified0· sign in to hype

Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, Haifeng Wang

Unverified — Be the first to reproduce this paper.

Abstract

Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervised learning methods to pre-train the GNNs to overcome the problem of insufficient labeled molecules. However, existing GNNs and pre-training strategies usually treat molecules as topological graph data without fully utilizing the molecular geometry information. Whereas, the three-dimensional (3D) spatial structure of a molecule, a.k.a molecular geometry, is one of the most critical factors for determining molecular physical, chemical, and biological properties. To this end, we propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL). At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. To be specific, we devised double graphs for a molecule: The first one encodes the atom-bond relations; The second one encodes bond-angle relations. Moreover, on top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge by utilizing the local and global molecular 3D structures. We compare ChemRL-GEM with various state-of-the-art (SOTA) baselines on different molecular benchmarks and exhibit that ChemRL-GEM can significantly outperform all baselines in both regression and classification tasks. For example, the experimental results show an overall improvement of 8.8% on average compared to SOTA baselines on the regression tasks, demonstrating the superiority of the proposed method.

Tasks

Molecular Property Prediction molecular representation Property Prediction regression Representation Learning Self-Supervised Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
BACE	ChemRL-GEM	ROC-AUC	85.6	—	Unverified
BBBP	ChemRL-GEM	ROC-AUC	72.4	—	Unverified
clintox	ChemRL-GEM	ROC-AUC	90.1	—	Unverified
ESOL	ChemRL-GEM	RMSE	0.8	—	Unverified
FreeSolv	ChemRL-GEM	RMSE	1.88	—	Unverified
Lipophilicity	ChemRL-GEM	RMSE	0.66	—	Unverified
QM7	ChemRL-GEM	MAE	58.9	—	Unverified
QM8	ChemRL-GEM	MAE	0.02	—	Unverified
QM9	ChemRL-GEM	MAE	0.01	—	Unverified
SIDER	ChemRL-GEM	ROC-AUC	67.2	—	Unverified
Tox21	ChemRL-GEM	ROC-AUC	78.1	—	Unverified
ToxCast	ChemRL-GEM	ROC-AUC	69.2	—	Unverified

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Abstract

Tasks

Benchmark Results

Reproductions