Can protein design be made flexible and controllable?

Original title: Flexible and Controllable Protein Design by Prefix-tuning Large-Scale Protein Language Models

Authors: Jiawei Luo,Xianliang Liu,Jiahao Li,Qingcai Chen,Junjie Chen

In this article, the authors discuss the challenges and potential of designing novel proteins for specific biomedical purposes. They explain that protein language models (ProtLMs) have made significant strides in controllable protein design, but face limitations due to the limited vocabulary of protein sequences and the difficulty of fine-tuning with limited data. To address these issues, the authors propose a method called PrefixProt, which uses prefix-tuning to learn virtual tokens for different protein properties. These virtual tokens can then be used to prompt pre-trained ProtLMs to generate proteins with tailored structures and functions. The authors trained two prefix virtual tokens on different datasets and achieved superior results compared to fine-tuning, even with limited data. They also demonstrate how the learned virtual tokens can be combined to precisely control the generation of proteins with specific functions and structures. Overall, PrefixProt offers a flexible and controllable solution for protein design, with the potential to advance biomedical research and drug discovery.

Original article: https://www.biorxiv.org/content/10.1101/2023.12.03.569747v1