Interpreting Disease Relevance of Amino Acid Substitutions


High‐throughput sequencing methods can generate large amounts of information about genetic variations; however, interpretation of this data has become a severe bottleneck for efficient use of genomic data, for example, in diagnostics. Identification of variations responsible for phenotypes is laborious and many times difficult task. Amino acid substitutions are among most common disease‐causing variants. Human genome codes for, on an average, approximately 11 000 such variants. Computational tools are needed to filter and rank raw variation datasets for further studies. Amino acid substitutions can have numerous effects, and mechanisms behind them are diverse. Therefore, different kinds of methods have been developed. Tolerance predictors aim at finding out likely harmful variants. Mechanism‐ and effect‐specific tools are dedicated for specific outcomes of variants.

Key Concepts:

  • Amino acid substitution is a change in protein sequence where a single residue is changed.

  • Benchmark dataset contains cases with known effect. It serves as the gold standard, for example, for method performance assessment and training machine learning‐based methods.

  • Human variome project (HVP) is an international organisation coordinating research and standards for variation research.

  • Next generation sequencing methods are fast nucleotide sequencing methods taking benefit of multiplexing and able of sequencing complete genomes very fast.

  • Performance measures are used to indicate performance of prediction methods. For full picture of performance, a number of measures should be reported.

  • Tolerance predictors are methods to predict whether amino acid substitutions are tolerated or not in a sequence.

  • Variation is a change in nucleotide or amino acid sequence in comparison with the reference sequence.

Keywords: variants; pathogenicity; amino acid substitution; tolerance predictor; cancer; locus specific databases; protein localisation; machine learning; next generation sequencing; protein stability predictor

Figure 1.

Schema for pathogenicity assessment of variants with experimental and prediction methods.

Figure 2.

Flowchart for bioinformatics analysis of variant effects.



Ali HS, Olatubosun A and Vihinen M (2012) Classification of mismatch repair gene missense variants with PON‐MMR. Human Mutation 33: 642–650.

Bucciantini M, Calloni G, Chiti F et al. (2004) Prefibrillar amyloid protein aggregates share common features of cytotoxicity. Journal of Biological Chemistry 279: 31374–31382.

Calabrese R, Capriotti E, Fariselli P, Martelli PL and Casadio R (2009) Functional annotations improve the predictive score of human disease‐related mutations in proteins. Human Mutation 30: 1237–1244.

Capriotti E and Altman RB (2011) A new disease‐specific machine learning approach for the prediction of cancer‐causing missense variants. Genomics 98: 310–317.

Capriotti E, Fariselli P, Rossi I and Casadio R (2008) A three‐state prediction of single point mutations on protein stability changes. BMC Bioinformatics 9(suppl. 2): S6.

Capriotti E, Nehrt NL, Kann MG and Bromberg Y (2012) Bioinformatics for personal genome interpretation. Briefings in Bioinformatics 13: 495–512.

Carter H, Chen S, Isik L et al. (2009) Cancer‐specific high‐throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Research 69: 6660–6667.

Chao EC, Velasquez JL, Witherspoon MS et al. (2008) Accurate classification of MLH1/MSH2 missense variants with multivariate analysis of protein polymorphisms‐mismatch repair (MAPP‐MMR). Human Mutation 29: 852–860.

Chiti F, Webster P, Taddei N et al. (1999) Designing conditions for in vitro formation of amyloid protofilaments and fibrils. Proceedings of the National Academy of Sciences of USA 96: 3590–3594.

Conchillo‐Sole O, de Groot NS, Aviles FX et al. (2007) AGGRESCAN: a server for the prediction and evaluation of hot “spots” of aggregation in polypeptides. BMC Bioinformatics 8: 65.

Dees ND, Zhang Q, Kandoth C et al. (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Research 22: 1589–1598.

Emanuelsson O, Brunak S, von Heijne G and Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nature Protocols 2: 953–971.

Fernandez‐Escamilla AM, Rousseau F, Schymkowitz J and Serrano L (2004) Prediction of sequence‐dependent and mutational effects on the aggregation of peptides and proteins. Nature Biotechnology 22: 1302–1306.

Ferrer‐Costa C, Orozco M and de la Cruz X (2002) Characterization of disease‐associated single amino acid polymorphisms in terms of sequence and structure properties. Journal of Molecular Biology 315: 771–786.

Giardine B, Riemer C, Hefferon T et al. (2007) PhenCode: connecting ENCODE data with mutations and phenotype. Human Mutation 28: 554–555.

Goldgar DE, Easton DF, Byrnes GB et al. (2008) Genetic evidence and integration of various data sources for classifying uncertaing variants into a single model. Human Mutation 29: 1265–1272.

González‐Pérez A and López‐Bigas N (2011) Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. American Journal of Human Genetics 88: 440–449.

Hamosh A, Scott AF, Amberger JS, Bocchini CA and McKusick VA (2005) Online mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33: D514–D517.

Horton P, Park KJ, Obayashi T et al. (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Research 35: W585–W587.

Kaminker JS, Zhang Y, Watanabe C and Zhang Z (2007) CanPredict: a computational tool for predicting cancer‐associated missense mutations. Nucleic Acids Research 35: W595–W598.

Karchin R (2009) Next generation tools for the annotation of human SNPs. Briefings in Bioinformatics 10: 35–52.

Kent WJ, Sugnet CW, Furey TS et al. (2002) The human genome browser at UCSC. Genome Research 12: 996–1006.

Khan S and Vihinen M (2010) Performance of protein stability predictors. Human Mutation 31: 675–684.

Kohonen‐Corish M, Al‐Aama J, Auerbach A et al. (2010) How to catch all those mutations – The report of the third Human Variome Project Meeting, UNESCO Paris, May 2010. Human Mutation 31: 1374–1381.

Laurila K and Vihinen M (2009) Disease‐related mutations affecting protein localization. BMC Genomics 10: 122.

Li B, Krishnan VG, Mort ME et al. (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25: 2744–2750.

Martelli PL, Fariselli P, Balzani E and Casadio R (2012) Predicting cancer‐associated germline variations in proteins. BMC Genomics 13(suppl. 4): S8.

Miller MP and Kumar S (2001) Understanding human disease mutations through the use of interspecific genetic variation. Human Molecular Genetics 10: 2319–2328.

Nair PS and Vihinen M (2013) VariBench: a benchmark database for variations. Human Mutation 34: 42–49.

Olatubosun A, Väliaho J, Härkönen J et al. (2012) PON‐P: integrated predictor for pathogenicity of missense variants. Human Mutation 33: 1166–1174.

Plon SE, Eccles DM, Easton D et al. (2008) Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Human Mutation 29: 1282–1291.

Potapov V, Cohen M and Schreiber G (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Engineering Design and Selection 22: 553–560.

Poussu E, Vihinen M, Paulin L and Savilahti H (2004) Probing the α‐complementing domain of E. coli β‐galactosidase with use of an insertional pentapeptide mutagenesis strategy based on Mu in vitro DNA transposition. Proteins 54: 681–692.

Reva B, Antipin Y and Sander C (2011) Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Research 39: e118.

Schymkowitz J, Borg J, Stricher F et al. (2005) The FoldX web server: an online force field. Nucleic Acids Research 33: W382–W388.

Shen B and Vihinen M (2004) Conservation and covariance in PH domain sequences: physicochemical profile and information theoretical analysis of XLA‐causing mutations in the Btk PH domain. Protein Engineering Design and Selection 17: 267–276.

Sherry ST, Ward MH, Kholodov M et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29: 308–311.

Stalker J, Gibbins B, Meidl P et al. (2004) The Ensembl Web site: mechanics of a genome browser. Genome Research 14: 951–955.

Stenson PD, Ball E, Howells K et al. (2008) Human gene mutation database: towards a comprehensive central mutation database. Journal of Medical Genetics 45: 12.

Steward RE, MacArthur MW, Laskowski RA and Thornton JM (2003) Molecular basis of inherited diseases: a structural perspective. Trends in Genetics 19: 505–513.

The UniProt Consortium (2013) Update on activities at the universal protein resource (UniProt) in 2013. Nucleic Acids Research 41: D43–D47.

Thusberg J, Olatubosun A and Vihinen M (2011) Performance of mutation pathogenicity prediction methods. Human Mutation 32: 358–368.

Thusberg J and Vihinen M (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human Mutation 30: 703–714.

Trovato A, Seno F and Tosatto SC (2007) The PASTA server for protein aggregation prediction. Protein Engineering Design and Selection 20: 521–523.

Vihinen M (2012) How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis. BMC Genomics 13(suppl. 4): S2.

Vihinen M (2013) Guidelines for reporting and using prediction tools. Human Mutation 34: 275–282.

Vitkup D, Sander C and Church GM (2003) The amino‐acid mutational spectrum of human genetic disease. Genome Biology 4: R72.

Wang Z and Moult J (2001) SNPs, protein structure, and disease. Human Mutation 17: 263–270.

Wheeler DL, Church DM, Federhen S et al. (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Research 31: 28–33.

Yip YL, Famiglietti M, Gos A et al. (2008) Annotating single amino acid polymorphisms in the UniProt/Swiss‐Prot knowledgebase. Human Mutation 29: 361–366.

Zhou H and Zhou Y (2002) Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction. Protein Science 11: 2714–2726.

Further Reading

Stefl S, Nishi H, Petukh M et al. (2013) Molecular mechanisms of disease‐causing missense mutations. Journal of Molecular Biology 425: 3919–3936.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Vihinen, Mauno(Mar 2014) Interpreting Disease Relevance of Amino Acid Substitutions. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0025177]