Performance of mutation pathogenicity prediction tools on missense variants associated with 46, XY differences of sex development

Montenegro, Luciana R; Lerário, Antônio M; Nishi, Miriam Y; Jorge, Alexander A L; Mendonca, Berenice B

OBJECTIVES:

Single nucleotide variants (SNVs) are the most common type of genetic variation among humans. High-throughput sequencing methods have recently characterized millions of SNVs in several thousand individuals from various populations, most of which are benign polymorphisms. Identifying rare disease-causing SNVs remains challenging, and often requires functional in vitro studies. Prioritizing the most likely pathogenic SNVs is of utmost importance, and several computational methods have been developed for this purpose. However, these methods are based on different assumptions, and often produce discordant results. The aim of the present study was to evaluate the performance of 11 widely used pathogenicity prediction tools, which are freely available for identifying known pathogenic SNVs: Fathmn, Mutation Assessor, Protein Analysis Through Evolutionary Relationships (Phanter), Sorting Intolerant From Tolerant (SIFT), Mutation Taster, Polymorphism Phenotyping v2 (Polyphen-2), Align Grantham Variation Grantham Deviation (Align-GVGD), CAAD, Provean, SNPs&GO, and MutPred.

METHODS:

We analyzed 40 functionally proven pathogenic SNVs in four different genes associated with differences in sex development (DSD): 17β-hydroxysteroid dehydrogenase 3 (HSD17B3), steroidogenic factor 1 (NR5A1), androgen receptor (AR), and luteinizing hormone/chorionic gonadotropin receptor (LHCGR). To evaluate the false discovery rate of each tool, we analyzed 36 frequent (MAF>0.01) benign SNVs found in the same four DSD genes.

The quality of the predictions was analyzed using six parameters:

accuracy, precision, negative predictive value (NPV), sensitivity, specificity, and Matthews correlation coefficient (MCC). Overall performance was assessed using a receiver operating characteristic (ROC) curve.

RESULTS:

Our study found that none of the tools were 100% precise in identifying pathogenic SNVs. The highest specificity, precision, and accuracy were observed for Mutation Assessor, MutPred, SNP, and GO. They also presented the best statistical results based on the ROC curve statistical analysis. Of the 11 tools evaluated, 6 (Mutation Assessor, Phanter, SIFT, Mutation Taster, Polyphen-2, and CAAD) exhibited sensitivity >0.90, but they exhibited lower specificity (0.42-0.67). Performance, based on MCC, ranged from poor (Fathmn=0.04) to reasonably good (MutPred=0.66).

CONCLUSION:

Computational algorithms are important tools for SNV analysis, but their correlation with functional studies not consistent. In the present analysis, the best performing tools (based on accuracy, precision, and specificity) were Mutation Assessor, MutPred, and SNPs&GO, which presented the best concordance with functional studies.