Selection against Amino Acid Replacements in Human Proteins


The vast majority of mutations occurring in the coding regions of human genes alter the encoded amino acids of proteins. A significant proportion of these mutations are known to disrupt the structure and/or function of human proteins. The similarity of a large fraction of amino acid residues of human protein sequences with that of other species implies that the amino acid altering mutations were purged through natural selection during the evolution of the human lineage. In contrast, due to the effects of genetic drift, amino acid replacement mutations are present within human populations at low frequencies and a high proportion of such mutations are harmful to humans. Therefore, understanding the contrasting patterns of long‐term and short‐term evolutionary histories of human proteins are vital in identifying the amino acid mutations associated with human genetic diseases.

Key Concepts:

  • Natural selection eliminates a high proportion of amino acid changing mutations over time.

  • Selection against amino acid replacement mutations can be quantified at the level of whole proteome, specific protein or individual amino acid residue.

  • The intensity of selection is high for proteins performing essential housekeeping functions.

  • Selection intensity not only varies between proteins but also between amino acid residues within a protein.

  • Mutations involving changes between biochemically dissimilar amino acids are under high selection pressure.

  • Amino acid polymorphisms are abundant in human population due to genetic drift.

  • Evolutionary history, structure and functional properties of human proteins are useful to identify amino acid mutations associated with genetic diseases.

Keywords: natural selection; protein evolution; genetic drift; polymorphisms; human genetic diseases

Figure 1.

Hierarchical levels of selection.

Figure 2.

Histogram of the average ω (KA/KS) of human proteins under different magnitudes of selection. Divergence at synonymous (KS) and nonsynonymous (KA) positions were estimated for the human–chimpanzee species pair (13 454 protein‐coding genes) using the data obtained from Mikkelsen et al. .

Figure 3.

Median ω (KA/KS) of human proteins belonging to different biological processes (Ashburner et al., ). The vertical dashed line represents the median ω estimated for the complete human–chimpanzee proteomes. Data obtained from Mikkelsen et al. .

Figure 4.

Proportion of amino acid positions and their relative evolutionary conservation. The proteins of the genes IVD and CRB1 were used in this analysis. The relative conservation of amino acid positions was estimated using multiple sequence alignments consisting of human proteins and their orthologous counterparts from mouse, chicken, puffer fish and fruit fly. The evolutionary rate of the amino acid positions was estimated using the maximum likelihood method with a discrete gamma function (Yang, ), and the rate estimates were normalised to a 0–1 scale. The shape parameters (α) of the gamma functions are 0.6 and 1.5 for IVD and CRB1, respectively.

Figure 5.

Temporal patterns of amino acid polymorphisms (or substitutions) in the human lineage. (a) An illustration showing the decline of amino acid polymorphisms over time. (b) Phylogeny of chimpanzee and humans from different populations. A–F denotes the branches of the tree in the ascending order of time. (A) and (F) are the oldest and youngest branches of the tree, respectively. (c) ω (KA/KS) estimates obtained for each branch of the tree (data from Subramanian, ). The ωs were estimated using the nonsynonymous and synonymous polymorphisms that were specific to a single European (F), those shared between the two Europeans (E), Europeans and Asian (D), Eurasians and Yoruban (West African) (C), Khoisan (an ancient African tribe) and other humans (B). The interspecies ω was estimated for the human–chimpanzee pair (A) using the divergence at nonsynonymous and synonymous sites.



Adzhubei IA, Schmidt S, Peshkin L et al. (2010) A method and server for predicting damaging missense mutations. Nature Methods 7: 248–249.

Alba MM and Castresana J (2005) Inverse relationship between evolutionary rate and age of mammalian genes. Molecular Biology and Evolution 22: 598–606.

Arbiza L, Duchi S, Montaner D et al. (2006) Selective pressures at a codon‐level predict deleterious mutations in human disease genes. Journal of Molecular Biology 358: 1390–1404.

Ashburner M, Ball CA, Blake JA et al. (2000) Gene ontology: tool for the unification of biology. Nature Genetics 25: 25–29.

Bromberg Y and Rost B (2007) SNAP: predict effect of non‐synonymous polymorphisms on function. Nucleic Acids Research 35: 3823–3835.

Bustamante CD, Fledel‐Alon A, Williamson S et al. (2005) Natural selection on protein‐coding genes in the human genome. Nature 437: 1153–1157.

Carmel L and Koonin EV (2009) A universal nonmonotonic relationship between gene compactness and expression levels in multicellular eukaryotes. Genome Biololgy Evolution 1: 382–390.

Challis CJ and Schmidler SC (2012) A stochastic evolutionary model for protein structure alignment and phylogeny. Molecular Biology and Evolution 29: 3575–3587.

Chen FC, Liao BY, Pan CL, Lin HY and Chang AY (2012) Assessing determinants of exonic evolutionary rates in mammals. Molecular Biology and Evolution 29: 3121–3129.

Choi SS, Vallender EJ and Lahn BT (2006) Systematically assessing the influence of 3‐dimensional structural context on the molecular evolution of mammalian proteomes. Molecular Biology and Evolution 23: 2131–2133.

Duret L and Mouchiroud D (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Molecular Biology and Evolution 17: 68–74.

Eyre‐Walker A, Keightley PD, Smith NG and Gaffney D (2002) Quantifying the slightly deleterious mutation model of molecular evolution. Molecular Biology and Evolution 19: 2142–2149.

Eyre‐Walker A, Woolfit M and Phelps T (2006) The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics 173: 891–900.

Gibbs RA, Rogers J, Katze MG et al. (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316: 222–234.

Hasegawa M, Cao Y and Yang Z (1998) Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Molecular Biology and Evolution 15: 1499–1505.

Henn BM, Gignoux CR, Feldman MW and Mountain JL (2009) Characterizing the time dependency of human mitochondrial DNA mutation rate estimates. Molecular Biology and Evolution 26: 217–230.

Ho SYW, Phillips MJ, Cooper A and Drummond AJ (2005) Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Molecular Biology and Evolution 22: 1561–1568.

Jones DT, Taylor WR and Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences 8: 275–282.

Kann MG, Thiessen PA, Panchenko AR et al. (2005) A structure‐based method for protein sequence alignment. Bioinformatics 21: 1451–1456.

Li Y, Vinckenbosch N, Tian G et al. (2010) Resequencing of 200 human exomes identifies an excess of low‐frequency non‐synonymous coding variants. Nature Genetics 42: 969–972.

Liao BY, Scott NM and Zhang J (2006) Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Molecular Biology and Evolution 23: 2072–2080.

Mikkelsen TS, Hillier LW, Eichler EE et al. (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437: 69–87.

Nelson MR, Wegmann D, Ehm MG et al. (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337: 100–104.

Ohta T (1992) The nearly neutral theory of molecular evolution. Annual Review of Ecology and Systematics 23: 263–286.

Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H and Hurst LD (2007) Splicing and the evolution of proteins in mammals. PLoS Biology 5: e14.

Piganeau G and Eyre‐Walker A (2009) Evidence for variation in the effective population size of animal mitochondrial DNA. PLoS One 4: e4396.

Popadin KY, Nikolaev SI, Junier T, Baranova M and Antonarakis SE (2012) Purifying selection in mammalian mitochondrial protein‐coding genes is highly effective and congruent with evolution of nuclear genes. Molecular Biology and Evolution 30: 347–355.

Rand DM and Kann LM (1996) Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. Molecular Biology and Evolution 13: 735–748.

Sim NL, Kumar P, Hu J et al. (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Research 40: W452–W457.

Stenson PD, Mort M, Ball EV et al. (2009) The human gene mutation database: 2008 update. Genome medicine 21: 577–581.

Subramanian S (2009) Temporal trails of natural selection in human mitogenomes. Molecular Biology and Evolution 26: 715–717.

Subramanian S (2012a) Quantifying harmful mutations in human populations. European Journal of Human Genetics 20: 1320–1322.

Subramanian S (2012b) The abundance of deleterious polymorphisms in humans. Genetics 190: 1579–1583.

Subramanian S and Kumar S (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168: 373–381.

Subramanian S and Lambert D (2011) Time dependency of molecular evolutionary rates? Yes and No. Genome Biology Evolution 3: 1324–1328.

Tennessen JA, Bigham AW, O'Connor TD et al. (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337: 64–69.

Wang Z and Moult J (2001) SNPs, protein structure, and disease. Human Mutation 17: 263–270.

Wolf YI, Novichkov PS, Karev GP, Koonin EV and Lipman DJ (2009) The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proceedings of the National Academy of Sciences of the USA 106: 7273–7280.

Yampolsky LY, Kondrashov FA and Kondrashov AS (2005) Distribution of the strength of selection against amino acid replacements in human proteins. Human Molecular Genetics 14: 3191–3201.

Yang Z (1996) Among‐site rate variation and its impact on phylogenetic analyses. Trends Ecology Evolution 11: 367–372.

Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24: 1586–1591.

Zhu Q, Ge D, Maia JM et al. (2011) A genome‐wide comparison of the functional properties of rare and common genetic variants in humans. American Journal of Human Genetics 88: 458–468.

Further Reading

Kimura M (1983) The Neutral Theory of Molecular Evolution. Cambridge, UK: Cambridge University Press.

Li W‐H (1997) Molecular Evolution. Sunderland, MA: Sinauer Associates, Inc.

Nei M and Kumar S (2000) Molecular Evolution and Phylogenetics. New York: Oxford University Press.

Ng PC and Henikoff S (2006) Predicting the effects of amino acid substitutions on protein function. Annual Review of Genomics and Human Genetics 7: 61–80.

Yang Z (2006) Computational Molecular evolution. Oxford: Oxford University Press.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Subramanian, Sankar(Mar 2013) Selection against Amino Acid Replacements in Human Proteins. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0020859.pub2]