Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy
Yunxin Xu, Di Liu, Haipeng Gong
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/Gonglab-THU/GeoStabpytorch★ 39
Abstract
Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models—GeoFitness, GeoDDG and GeoDTm—for the prediction of fitness score, ΔΔG and ΔTm of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of ΔΔG and ΔTm prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient.