Can GPT-4 determine if articles support or refute variant pathogenicity?

Original title: Using GPT-4 Prompts to Determine Whether Articles Contain Functional Evidence Supporting or Refuting Variant Pathogenicity

Authors: Samuel J. Aronson (1,2), Kalotina Machini (1,3), Pranav Sriraman (1), Jiyeon Shin (2), Emma R. Henricks (1), Charlotte Mailly (1,2), Angie J. Nottage (1), Michael Oates (1,2), Matthew S. Lebo (1,3) ((1) Mass Gneral Brigham Personalized Medicine, (2) Accelerator for Clinical Transformation, Mass General Brigham, (3) Department of Pathology, Brigham and Women’s Hospital)

In this article, the researchers aimed to evaluate the capabilities of Generative Pre-trained Transformer version 4 (GPT-4) in classifying articles that contain functional evidence relevant to assessments of variant pathogenicity. They trained GPT-4 using 45 articles and genetic variants, and then tested it on a set of 72 manually classified articles and variants using two prompts. The first prompt asked GPT-4 to identify articles with functional evidence, achieving a sensitivity of 87% and a positive predictive value (PPV) of 89%. However, GPT-4 mistakenly indicated that 5 out of 26 articles without functional data had functional evidence. The second prompt asked GPT-4 to classify the evidence into different categories, and it showed high sensitivity and PPV for pathogenic and benign classifications, but lower sensitivity for intermediate or inconclusive evidence. The overall conclusion is that GPT-4 can effectively detect the presence or absence of functional assays and prioritize articles for review, but it is not capable of fully automating the genetics literature review process in variant classification.

Original article: https://arxiv.org/abs/2312.13521