Progressive Multi-Modality Learning for Inverse Protein Folding

2023-12-11Code Available1· sign in to hype

Jiangbin Zheng, Stan Z. Li

Code Available — Be the first to reproduce this paper.

Code

github.com/binbinjiang/cvt-slr
pytorch★ 178

Abstract

While deep generative models show promise for learning inverse protein folding directly from data, the lack of publicly available structure-sequence pairings limits their generalization. Previous improvements and data augmentation efforts to overcome this bottleneck have been insufficient. To further address this challenge, we propose a novel protein design paradigm called MMDesign, which leverages multi-modality transfer learning. To our knowledge, MMDesign is the first framework that combines a pretrained structural module with a pretrained contextual module, using an auto-encoder (AE) based language model to incorporate prior protein semantic knowledge. Experimental results, only training with the small dataset, demonstrate that MMDesign consistently outperforms baselines on various public benchmarks. To further assess the biological plausibility, we present systematic quantitative analysis techniques that provide interpretability and reveal more about the laws of protein design.

Tasks

cross-modal alignment Data Augmentation Language Modeling Language Modelling Protein Design Protein Folding Transfer Learning

Progressive Multi-Modality Learning for Inverse Protein Folding

Code

Abstract

Tasks

Reproductions