Selective Constraints and Human Disease Genes: Evolutionary and Bioinformatics Approaches


Natural selection rejects with variable strength, mutations reducing the individual's capability to survive and reproduce. Evolutionary theory predicts that mutations producing disease will be under strong selective constraints. Selective strength at the codon level will determine if mutation frequency will increase, decrease or change randomly during evolution. This strength finally serves in the prediction of nonsynonymous single nucleotide polymorphisms (nsSNPs) producing disease in humans. By using comparative genomics data and maximum likelihood phylogenetics approaches we demonstrate that mutations on residues showing low rates of evolution are significantly associated to disease and not to human genetic polymorphisms.

Keywords: evolution; disease; natural selection; comparative genomics; nsSNPs

Figure 1.

Distribution of p53 mutations. Mutation frequencies collected in the IARC TP53 R10 database (18 145 nonsynonymous mutations) are plotted against the protein domains. The DNA‐binding (p53DB) domain contains six residues considered mutational hotspots in cancer. Reproduced from Arbiza et al. , Copyright Elsevier (2006).

Figure 2.

Mapping of selective constraints in the p53DB domain. (a) The three‐dimensional structure of the p53DB domain showing residues coloured according to different selective pressures. (b) Primary amino acid sequence and secondary structure elements of the p53DB domain. Residues in red, orange, yellow and green show the gradual distribution of the selective constraints represented by ωSLR values. Residues in red (0⩽ω<0.1) and orange (0.1⩽ω<0.2) are generally associated to DNA contact sites (blue circles), Zn2+ contact (white circles) and sites where mutants are known to be denaturizing (red circles) among others. A few of the sites seem to be below the limit considered for selective constraints (yellow, 0.2⩽ω<0.3). Residues where selective constraints were predicted to be low (green, ω>0.3) are distributed along the external regions of the core domain, and most of them are interspersed between β sheets and helices. Arg248 binds in the minor groove of the DNA. Ser185 was conserved in the cluster of primates and rodents, but was discarded in the analysis due to gap insertions in the basal species. *, phylogenetically conserved residue; +, SLR detected PFS at 95% confidence after correcting for multiple testing and !, as in +, but at 99% confidence. See text for a detailed explanation. Reproduced from Arbiza et al. , Copyright Elsevier (2006).

Figure 3.

Mutation frequency and ω distribution in p53. (a) The distribution of p53 residues in the ω‐frequency space describe an ‘L‐shaped’ curve where sites under selective constraints (low ω values) are preferentially associated to high mutational frequencies associated to cancer. Conversely, residues above the limit of the effects of purifying selection (high ω) are preferentially associated to low mutational frequencies. (b) Mutational hotspots show high evolutionary constraints imposed by natural selection and the highest mutational frequencies. The cutoff value represents the maximum ω value for which residues were deduced to be under the influence of purifying selection at 95% or 99% confidence using the SLR method. This threshold represents the a priori hypothesis used to detect a statistical significance between mutation frequency and ω values using a large set of human disease genes. Reproduced from Arbiza et al. , Copyright Elsevier (2006).

Figure 4.

ω·distribution, disease and polymorphism. More than 8000 amino acid mutations defined as disease and polymorphic variation in the Swiss‐Prot database are clearly differentiated by selective constraints. The boxplot shows the median (horizontal bold line), the upper and lower quartiles (box) and the interquartile range (dashed vertical lines). For visual clarity a horizontal dotted line indicates ω=0.1.

Figure 5.

Analysis of the selective constraints of cSNPs in the human genome. The PupaSuite web server provides a complete set of tools for SNP characterization to assist users interested in genotyping experiments. The selection strength acting on all the cSNPs of the human genes can be reported according to different thresholds. By default the functional analysis of PupaSuite reports cSNPs showing ω<0.1.



Arbiza L, Dopazo J and Dopazo H (2006) Selective pressures at a codon‐level predict deleterious mutations in human disease genes. Journal of Molecular Biology 358: 1390–1404.

Bao L, Zhou M and Cui Y (2005) nsSNPAnalyzer: identifying disease‐associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Research 33: W480–W482.

Botstein D and Risch N (2003) Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genetics 33(suppl.): 228–237.

Breiman L (2001) Random forest. Technical Report, Statistics Department UCB.

Bromberg Y and Rost B (2007) SNAP: predict effect of non‐synonymous polymorphisms on function. Nucleic Acids Research 35(11): 3823–3835.

Capriotti E, Fariselli P, Calabrese R and Casadio R (2005) Predicting protein stability changes from sequences using support vector machines. Bioinformatics 21(suppl. 2): ii54–ii58.

Capriotti E, Arbiza L, Casadio R et al. (2008) Use of estimated evolutionary strength at the codon level improves the prediction of disease‐related protein mutations in human. Human Mutation 29(1): 98–204.

Cargill M, Altshuler D, Ireland J et al. (1999) Characterization of single nucleotide polymorphisms in coding regions of human genes. Nature Genetics 22: 231–238.

Chasman D and Adams RM (2001) Predicting the functional consequences of non‐synonymous single nucleotide polymorphisms: structure‐based assessment of amino acid variation. Journal of Molecular Biology 307(2): 683–706.

Cho Y, Gorina S, Jeffrey PD and Pavletich NP (1994) Crystal structure of a p53 tumor suppressor‐DNA complex: understanding tumorigenic mutations. Science 265: 346–355.

Collins FS, Green ED, Guttmacher AE, Guyer MS and US National Human Genome Research Institute (2003) A vision for the future of genomics research. Nature 422: 835–847.

Conde L, Vaquerizas JM, Dopazo H et al. (2006) PupaSuite: finding functional SNPs for large‐scale genotyping purposes. Nucleic Acids Research 34: W621–W625.

Felsenstein J (1985) Phylogenies and the comparative method. The American Naturalist 125: 1–15.

Ferrer‐Costa C, Gelpí JL, Zamakola L et al. (2005) PMUT: a web‐based tool for the annotation of pathological mutations on proteins. Bioinformatics 21(14): 3176–3178.

Golding B (1994) Using maximum likelihood to infer selection from phylogenies. In: Golding B (ed.) Non‐neutral Evolution. Theories and Molecular Data, pp. 126–139. New York: Chapman & Hall.

IHMC (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164): 851–861.

Karchin R, Diekhans M, Kelly L et al. (2005) LS‐SNP: large‐scale annotation of coding non‐synonymous SNPs based on multiple information sources. Bioinformatics 21(12): 2814–2820.

Kimura M (1983) The Neutral Theory of Molecular Evolution. New York: Cambridge University Press.

Massingham T and Goldman N (2005) Detecting amino acid sites under positive selection and purifying selection. Genetics 169: 1753–1762.

Miller MP and Kumar S (2001) Understanding human disease mutations through the use of interspecific genetic variation. Human Molecular Genetics 10: 2319–2328.

Ng PC and Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Research 11(5): 863–874.

Ng PC and Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Research 31: 3812–3814.

Ramensky V, Bork P and Sunyaev S (2002) Human non‐synonymous SNPs: server and survey. Nucleic Acids Research 30(17): 3894–3900.

Reumers J, Maurer‐Stroh S, Schymkowitz J and Rousseau F (2006) SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non‐synonymous SNPs. Bioinformatics 22: 2183–2185.

Santibáñez Koref MF, Gangeswaran R, Santibáñez Koref IP, Shanahan N and Hancock JM (2003) A phylogenetic approach to assessing the significance of missense mutations in disease genes. Human Mutation 22: 51–58.

Saunders CT and Baker D (2002) Evaluation of structural and evolutionary contributions to deleterious mutation prediction. Journal of Molecular Biology 322(4): 891–901.

Sunyaev S, Ramensky V, Koch I et al. (2001) Prediction of deleterious human alleles. Human Molecular Genetics 10: 591–597.

Sunyaev SR, Lathe WC 3rd, Ramensky VE and Bork P (2000) SNP frequencies in human genes: an excess of rare alleles and differing modes of selection. Trends in Genetics 16(8): 335–337.

Thomas PD and Kejariwalet A (2004) Coding single‐nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. Proceedings of the National Academy of Sciences of the USA 101(43): 15398–15403.

Thomas PD, Campbell MJ, Kejariwal A et al. (2003) PANTHER: a library of protein families and subfamilies indexed by function. Genome Research 13: 2129–2141.

Wang Z and Moult J (2001) SNPs, protein structure, and disease. Human Mutation 17: 236–270.

Yang Z (2003) Adaptive molecular evolution. In: Balding D, Bishop M and Cannings C (eds) Handbook of Statistical Genetics, 2nd edn, pp. 229–254. New York: Wiley.

Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24(8): 1586–1591.

Ye ZQ, Zhao SQ, Gao G et al. (2007) Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics 23(12): 1444–1450.

Yue P, Melamud E and Moult J (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7(1): 166.

Zuckerkandl E and Pauling L (1965) Molecules as documents of evolutionary history. Journal of Theoretical Biology 8: 357–366.

Further Reading

Ng and Henikoff (2006) Predicting the effects of amino acid substitutions on protein function. Annual Review of Genomics and Human Genetics 7: 61–80.

Yampolsky LY, Kondrashov FA and Kondrashov AS (2005) Distribution of the strength of selection against amino acid replacements in human proteins. Human Molecular Genetics 14: 3191–3201.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Dopazo, Hernán(Jul 2008) Selective Constraints and Human Disease Genes: Evolutionary and Bioinformatics Approaches. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0020762]