SOTAVerified

Improving Hit-finding: Multilabel Neural Architecture with DEL

2021-09-24NeurIPS Workshop AI4Scien 2021Unverified0· sign in to hype

Kehang Han, Steven Kearnes, Jin Xu, Wen Torng, JW Feng

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

DNA-Encoded Libraries (DEL thereafter) data, often with millions of data points, enables large deep learning models to make real contributions in the drug discovery process (e.g., hit-finding). The current state-of-the-art method of modeling DEL data, GCNN multiclass model, requires domain experts to create mutually exclusive classification labels from multiple selection readouts of DEL data, which is not always an ideal assumption to formulate the problem. In this work, we designed a GCNN multilabel architecture that directly models each selection data to eliminate the corresponding dependency on human expertise. We selected effective choices for key modeling components such as label reduction scheme from in silico evaluation.To assess its performance in real-world drug discovery settings, we further carried out prospective wet-lab testing where the multilabel model shows consistent improvement in hit-rate (percentage of hits in a proposed molecule list) over the current state-of-the-art multiclass model.

Tasks

Reproductions