TY - JOUR
T1 - Family-specific analysis of variant pathogenicity prediction tools
AU - Zaucha, Jan
AU - Heinzinger, Michael
AU - Tarnovskaya, Svetlana
AU - Rost, Burkhard
AU - Frishman, Dmitrij
N1 - Publisher Copyright:
© The Author(s) 2020. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. protein families in which its predictions prove unreliable (expected accuracy does not exceed 51% in any method). As a proof of principle, we show that choosing the optimal tool and pathogenicity threshold at a protein family-individual level allows obtaining reliable predictions in all Pfam domains (accuracy no less than 68%). A functional analysis of the sets of protein domains annotated exclusively by neutral or pathogenic mutations indicates that specific protein functions can be associated with a high or low sensitivity to mutations, respectively. The highly sensitive sets of protein domains are involved in the regulation of transcription and DNA sequence-specific transcription factor binding, while the domains that do not result in disease when mutated are responsible for mediating immune and stress responses. These results suggest that future predictors of pathogenicity and especially variant prioritization tools may benefit from considering functional annotation.
AB - Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. protein families in which its predictions prove unreliable (expected accuracy does not exceed 51% in any method). As a proof of principle, we show that choosing the optimal tool and pathogenicity threshold at a protein family-individual level allows obtaining reliable predictions in all Pfam domains (accuracy no less than 68%). A functional analysis of the sets of protein domains annotated exclusively by neutral or pathogenic mutations indicates that specific protein functions can be associated with a high or low sensitivity to mutations, respectively. The highly sensitive sets of protein domains are involved in the regulation of transcription and DNA sequence-specific transcription factor binding, while the domains that do not result in disease when mutated are responsible for mediating immune and stress responses. These results suggest that future predictors of pathogenicity and especially variant prioritization tools may benefit from considering functional annotation.
UR - http://www.scopus.com/inward/record.url?scp=85112744033&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqaa014
DO - 10.1093/nargab/lqaa014
M3 - Article
AN - SCOPUS:85112744033
SN - 2631-9268
VL - 2
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 2
M1 - lqaa014
ER -