Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

2026-03-19Unverified0· sign in to hype

Lucas Ferraz, Ana F. Rodrigues, Pedro Giesteira Cotovio, Mafalda Ventura, Gabriela Silva, Ana Sofia Coroadinha, Miguel Machuqueiro, Catia Pesquita

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Adeno-associated viral (AAV) vectors are widely used delivery platforms in gene therapy, and the design of improved capsids is key to expanding their therapeutic potential. A central challenge in AAV bioengineering, as in protein design more broadly, is the vast sequence design space relative to the scale of feasible experimental screening. Machine-guided generative approaches provide a powerful means of navigating this landscape and proposing novel protein sequences that satisfy functional constraints. Here, we develop a generative design framework based on protein language models and reinforcement learning to generate highly novel yet functionally plausible AAV capsids. A pretrained model was fine-tuned on experimentally validated capsid sequences to learn patterns associated with viability. Reinforcement learning was then used to guide sequence generation, with a reward function that jointly promoted predicted viability and sequence novelty, thereby enabling exploration beyond regions represented in the training data. Comparative analyses showed that fine-tuning alone produces sequences with high predicted viability but remains biased toward the training distribution, whereas reinforcement learining-guided generation reaches more distant regions of sequence space while maintaining high predicted viability. Finally, we propose a candidate selection strategy that integrates predicted viability, sequence novelty, and biophysical properties to prioritize variants for downstream evaluation. This work establishes a framework for the generative exploration of protein sequence space and advances the application of generative protein language models to AAV bioengineering.

Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

Abstract

Reproductions