Ancestral protein sequence reconstruction using a tree-structured Ornstein-Uhlenbeck variational autoencoder
Lys Sanz Moreta, Ola Rønning, Ahmad Salim Al-Sibahi, Jotun Hein, Douglas Theobald, Thomas Hamelryck
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We introduce a deep generative model for representation learning of biological sequences that, unlike existing models, explicitly represents the evolutionary process. The model makes use of a tree-structured Ornstein-Uhlenbeck process, obtained from a given phylogenetic tree, as an informative prior for a variational autoencoder. We apply our method to ancestral sequence reconstruction of single protein families and show that the accuracy is better than or on par with conventional phylogenetic methods, while scaling to larger data sets. Our results and ablation studies indicate that the explicit representation of evolution using a suitable tree-structured prior has the potential to improve representation learning of biological sequences considerably. Finally, we briefly discuss extensions of the model to genomic-scale data sets and the case of a latent phylogenetic tree.