The development of artificial intelligence and machine learning algorithms may allow for advances in patient care. There are existing and potential applications in cancer diagnosis and monitoring, identification of at-risk groups of individuals, classification of genetic variants, and even prediction of patient ancestry. This article provides an overview of some current and future applications of artificial intelligence in genomic medicine, in addition to discussing challenges and considerations when bringing these tools into clinical practice.
Artificial intelligence (AI) algorithms applied to genomic data can detect cancer, classify variants, and predict gene expression, among many other applications.
Historical inequities and biases exist in current, predominantly European-based data sets, which may lead to further inequities. “Fair” algorithms are being developed to address this.
Models developed from datasets of predominantly European ancestry may not translate well into other populations; therefore, diverse data sets are required.
There are ethical and legal considerations when building algorithms based on sensitive genomic data, and care must be taken to implement these models responsibly.
Artificial intelligence (AI) and machine learning (ML) have drastically changed society since their inception. This technology permeates every industry from targeted advertising to approval for credit cards and loans and even plays a role in aspects of the health care industry. The study of the human genome has likewise had a major impact on society, both in the hospital setting and even with commercially available genetic tests for ancestry and certain cancer risk genes. Both AI and genomic medicine are centered around data and have been made possible by advances in technology. It is perhaps, therefore, unsurprising that they synergize so well for the advancement of scientific knowledge and the diagnosis and management of patients. Here the authors explore a selection of topics in which AI has been used and/or has the opportunity to advance genomic medicine. Current and future challenges are also discussed, including algorithmic fairness, data security and privacy, and interpretability.
Variants of uncertain significance
All individuals will have numerous single nucleotide variants (SNVs) in protein coding regions of their genomes as compared with others or to a reference human genome sequence. However, many such variants, particularly those that are rare, will be variants of uncertain significance (VUS). By definition, VUS are neither known to have a phenotypic impact nor known to not have one (and if they do have an impact, the nature of the impact is uncertain). VUS comprise the vast majority of SNVs identified in any patient genomic or exomic analysis.
A typical workflow in the clinical evaluation of patient genomic testing involves algorithmic identification of variants from sequencing data followed by manual review of online databases such as ClinVar, OMIM, and ClinGen, among other resources. Literature is reviewed for reports and interpretations of the variants in question, and the variant is then deemed to be pathogenic, benign, or of uncertain significance. Guidelines have been published by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) regarding the interpretation of these variants.
VUS pose a particular challenge to clinicians, and often the interpretation of these variants changes as new data are uncovered. As a classification question in a large and complex data set, this is an excellent application for AI-based approaches. To that end, many algorithms have been developed that predict the deleteriousness of variants based on biophysical properties, such as PolyPhen-2, SIFT, and PROVEAN. The categorization of VUS can be achieved via supervised or unsupervised methods. Using supervised methods of machine learning, labels come from manually curated mutations and/or experiments. Unsupervised methods, however, allow for the development of classification models from unlabeled data by learning some underlying structure within the data points or integrating external information. Several supervised methods using AI/ML have been developed to determine the likelihood of a variant being deleterious. DEOGEN2, for example, incorporates information about the molecular effects, involved domains, gene relevance, and gene interactions. This information is then “mapped” into a deleteriousness score for the variant. This tool is a predictor for missense SNVs for human proteins. It uses evolutionary-based features, prediction of early folding protein residues, features related to protein domains, interaction patches, and gene- and pathway-oriented features. This model achieved comparable performance with other published predictors in the Humsavar16 dataset. High-throughput experiments have been developed that can evaluate thousands of variants simultaneously, called multiplexed assays of variant effects. , These high-throughput sequencing assays interrogate the effects of various variant types and allow for massive scalability of functional assays. Historically, most functional assays have been reactive, wherein a variant will surface clinically, then a functional assay will be performed to explore the effects of the variant. However, given the vast number of uncharacterized variants that exist, and with current computing power that is available, it becomes attractive to take a more proactive experimental approach. Evolutionary model of variant effect (EVE) is an example of unsupervised learning in the classification of variants. EVE classifies human genetic variants solely on evolutionary sequences, looking at the distribution of sequence variation across different species to determine the likelihood of pathogenicity in humans. The model outperforms other state-of-the-art computational methods of variant classification. In fact, EVE has been found to be as accurate as high-throughput experiments in variant classification. The model was able to predict the pathogenicity of more than 36 million variants, including evidence for the classification of more than 256,000 VUS.