SOTAVerified

N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

2018-06-24NeurIPS 2019Code Available0· sign in to hype

Shengchao Liu, Mehmet Furkan Demirel, YIngyu Liang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules. The method first embeds the vertices in the molecule graph. It then constructs a compact representation for the graph by assembling the vertex embeddings in short walks in the graph, which we show is equivalent to a simple graph neural network that needs no training. The representations can thus be efficiently computed and then used with supervised learning methods for prediction. Experiments on 60 tasks from 10 benchmark datasets demonstrate its advantages over both popular graph neural networks and traditional representation methods. This is complemented by theoretical analysis showing its strong representation and prediction power.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
BACEN-GramRFROC-AUC77.9Unverified
BACEN-GramXGBROC-AUC79.1Unverified
BBBPN-GramXGBROC-AUC69.1Unverified
BBBPN-GramRFROC-AUC69.7Unverified
clintoxN-GramXGBROC-AUC87.5Unverified
clintoxN-GramRFROC-AUC77.5Unverified
FreeSolvN-GramRFRMSE2.69Unverified
FreeSolvN-GramXGBRMSE5.06Unverified
LipophilicityN-GramRFRMSE0.81Unverified
LipophilicityN-GramXGBRMSE2.07Unverified
QM7N-GramRFMAE92.8Unverified
QM7N-GramXGBMAE81.9Unverified
QM8N-GramXGBMAE0.02Unverified
QM8N-GramRFMAE0.02Unverified
QM9N-GramXGBMAE0.01Unverified
QM9N-GramRFMAE0.01Unverified
SIDERN-GramRFROC-AUC66.8Unverified
SIDERN-GramXGBROC-AUC65.5Unverified
Tox21N-GramXGBROC-AUC75.8Unverified
Tox21N-GramRFROC-AUC74.3Unverified

Reproductions