SOTAVerified

Regression modeling on DNA encoded libraries

2021-09-24NeurIPS Workshop AI4Scien 2021Unverified0· sign in to hype

Ralph Ma, Gabriel Hart Stocker Dreiman, Fiorella Ruggiu, Adam Joseph Riesselman, Bowen Liu, Keith James, Mohammad Sultan, Daphne Koller

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

DNA encoded libraries (DELs) are pooled, combinatorial compound collections where each member is tagged with its own unique DNA barcode. DELs are used in drug discovery for early hit finding against protein targets. Recently, several groups have proposed building machine learning models with quantities derived from DEL datasets. However, DEL datasets have a low signal-to-noise ratio which makes modeling them challenging. To that end, we propose a novel graph neural network (GNN) based regression model that directly predicts enrichment scores from raw sequencing counts while accounting for multiple sources of technical variation and intrinsic assay noise. We show that our GNN regression model quantitatively outperforms standard classification approaches and can be used to find diverse sets of molecules in external virtual libraries.

Tasks

Reproductions