Original title: Investigating the ability of deep learning-based structure prediction to extrapolate and/or enrich the set of antibody CDR canonical forms
Authors: Alexander Greenshields-Watson,Brennan Abanades,Charlotte M Deane
This article explores the use of deep learning models to predict protein structures from protein sequences. The researchers used a deep learning antibody structure predictor, called ABodyBuilder2, to predict the structures of around 1.5 million paired antibody sequences. They focused on examining the predicted structures of a specific region of the antibody called the CDR loop.
The results showed that most of the predicted structures fell within the already known structural space of the CDR loop. However, they also discovered a small number of new structural clusters that were composed of diverse sequences but shared a common sequence motif and loop conformation.
To test the ability of the deep learning model to extrapolate, the researchers retrained several models while withholding certain types of antibody structures. These retrained models showed evidence of generalization across different lengths of the CDR loop, but they were not able to predict loop conformations that were highly distinct from those in the training data.
Overall, the study suggests that deep learning models for protein structure prediction are not capable of making completely out-of-domain predictions for CDR loops. However, even a small amount of data with a specific structural shape allows the model to regain its predictive abilities. The researchers have made the predicted structures used in the study available for download.
Original article: https://www.biorxiv.org/content/10.1101/2023.12.08.570786v1