SOTAVerified

Analyzing Learned Molecular Representations for Property Prediction

2019-04-02Code Available2· sign in to hype

Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, Regina Barzilay

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
BACED-MPNNROC-AUC80.9Unverified
BBBPD-MPNNROC-AUC71Unverified
clintoxD-MPNNROC-AUC90.6Unverified
ESOLD-MPNNRMSE1.05Unverified
FreeSolvD-MPNNRMSE2.08Unverified
LipophilicityD-MPNNRMSE0.68Unverified
QM7D-MPNNMAE103.5Unverified
QM8D-MPNNMAE0.02Unverified
QM9D-MPNNMAE0.01Unverified
SIDERD-MPNNROC-AUC57Unverified
Tox21D-MPNNROC-AUC75.9Unverified
ToxCastD-MPNNROC-AUC65.5Unverified

Reproductions