Single Nucleotide Polymorphisms in Human Disease and Evolution: Phylogenies and Genealogies


Single nucleotide polymorphisms (SNPs) are common genetic variants within a population. Several SNPs have been implicated directly in human diseases, but their biggest promise lies in their potential as genetic markers for discovering genetic factors in a wide variety of human diseases and other traits. There are two methods for modeling the processes by which population genetic variation has evolved: genealogy, the actual family tree in which genetic variants arose and were inherited from parents to children and phylogeny, a compressed history of how these genetic variants accumulated in lineages of successive copies of deoxyribonucleic acid (DNA) sequences. Genealogies and phylogenies are two ways to view the same evolutionary process, and each has specific applications to the discovery of genetic factors in human disease, especially relevant in the genomic era.

Keywords: population genetics; polymorphism; disease; evolution; phylogenetics

Figure 1.

Genetic differences between two homologous chromosome regions. Three single‐nucleotide differences are shown: an A/G difference (transition), a C/T difference (transition) and a G/T difference (transversion). Each of these three differences may appear at a different frequency in the population.

Figure 2.

Probability that an allele will change in frequency to 100%, given its current frequency. When the selection pressure is small (|s|<∼0.01% in an effective population of Ne=10 000), the allele behaves as if neutral (diagonal line). When negative selection is strong (s<–0.1%), it becomes very improbable that the allele will increase in frequency enough to result in polymorphism. Each curve represents a different coefficient of selection, s, conferred by the allele. Positive values of s indicate positive selection, negative values represent negative selection. Probability is calculated according to Kimura (1983) as y=[1−exp(−Sx)]/[1−exp(−S)], where S=4Nes. Therefore we are assuming the effective population size as 10 000, the estimated value during most of human evolution.

Figure 3.

A toy example illustrating the evolution of SNPs, seen from (a) the genealogical view and (b) the phylogenetic view. The starting population is genetically homogeneous and the population size is constant. (a) Individuals are shown as large open circles, with two chromosomes (grey bars). Mutations (copying errors) are shown as closed circles on the chromosomes. When variants arising from mutation are faithfully copied into following generations, they are shown as open circles. When a variant reaches 1% in the population, it is shown as a star. In this example, allele a (arising from an erroneous substitution of the parental allele A) is created by a mutation when an individual from the first generation copies its chromosomes and passes allele a to its children. After N generations, allele a has drifted to 1% frequency. In this example, allele b has already arisen as a mutation of allele B on a chromosome with allele a, sometime during the N generations. After an additional M generations, b reaches 1% frequency. The frequency of allele a must be greater than the frequency of allele b, since allele a can be found with both alleles B and b, but allele b is only found with allele a (because it was created on a chromosome with allele a, and is so close to a that it has not been separated by recombination). (b) Each unique DNA sequence is represented by an oval, with branches in the SNP phylogeny represented by arrows. Before N generations have passed, the AB allele is the only unique high‐frequency variant. After N generations, allele a reaches 1% frequency, represented as the aB allele (with the mutation labelled Aa on the branch). After M additional generations, the ab allele reaches 1% frequency as a phylogenetic ‘child’ of the aB allele. To help illustrate how phylogenies are represented, we also show an alternate tree that does not correspond to the genealogy shown in (a). This is the tree that would have resulted if allele b had arisen by mutation on an AB chromosome, instead of an aB chromosome.



Antonarakis SE, Krawczak M and Cooper DN (2000) Disease‐causing mutations in the human genome. European Journal of Pediatrics 159(suppl. 3): S173–S178.

Ardlie KG, Kruglyak L and Seielstad M (2002) Patterns of linkage disequilibrium in the human genome. Nature Reviews Genetics 3: 299–309.

Asthana S, Schmidt S and Sunyaev S (2005) A limited role for balancing selection. Trends in Genetics 21: 30–32.

Becker KG, Barnes KC, Bright TJ and Wang SA (2004) The genetic association database. Nature Genetics 36: 431–432.

Boerwinkle E (1996) A contemporary research paradigm for the genetic analysis of a common chronic disease. Annals of Medicine 28: 451–457.

Botstein D and Risch N (2003) Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genetics 33(suppl.): 228–237.

Bubb KL, Bovee D, Buckley D et al. (2006) Scan of human genome reveals no new loci under ancient balancing selection. Genetics 173: 2165–2177.

Carlson CS, Eberle MA, Kruglyak L and Nickerson DA (2004) Mapping complex disease loci in whole‐genome association studies. Nature 429: 446–452.

Chen JM, Cooper DN, Chuzhanova N, Ferec C and Patrinos GP (2007) Gene conversion: mechanisms, evolution and human disease. Nature Reviews. Genetics 8: 762–775.

Chou CY, Lin YL, Huang YC et al. (2005) Structural variation in human apolipoprotein E3 and E4: secondary structure, tertiary structure, and size distribution. Biophysical Journal 88: 455–466.

Cohen JC, Kiss RS, Pertsemlidis A et al. (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305: 869–872.

Cooper DN and Clayton JF (1988) DNA polymorphism and the study of disease associations. Human Genetics 78: 299–312.

Cooper RS (2003) Gene‐environment interactions and the etiology of common complex disease. Annals of Internal Medicine 139: 437–440.

Corbo RM and Scacchi R (1999) Apolipoprotein E (APOE) allele distribution in the world. Is APOE*4 a ‘thrifty’ allele? Annals of Human Genetics 63: 301–310.

Corder EH, Saunders AM, Strittmatter WJ et al. (1993) Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261: 921–923.

Di Rienzo A and Hudson RR (2005) An evolutionary framework for common diseases: the ancestral‐susceptibility model. Trends in Genetics 21: 596–601.

Eyre‐Walker A and Keightley PD (2007) The distribution of fitness effects of new mutations. Nature Reviews. Genetics 8: 610–618.

Felsenstein J (2004) Inferring Phylogenies. New York: Sinauer, Inc.

Frazer KA, Ballinger DG, Cox DR et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.

Frosst P, Blom HJ, Milos R et al. (1995) A candidate genetic risk factor for vascular disease: a common mutation in methylenetetrahydrofolate reductase. Nature Genetics 10: 111–113.

Fullerton SM, Clark AG, Weiss KM et al. (2000) Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. American Journal of Human Genetics 67: 881–900.

Glazier AM, Nadeau JH and Aitman TJ (2002) Finding genes that underlie complex traits. Science 298: 2345–2349.

Goldman N, Anderson JP and Rodrigo AG (2000) Likelihood‐based tests of topologies in phylogenetics. Systematic Biology 49: 652–670.

Guenther BD, Sheppard CA and Tran P (1999) The structure and properties of methylenetetrahydrofolate reductase from Escherichia coli suggest how folate ameliorates human hyperhomocysteinemia. Nature Structural Biology 6: 359–365.

Halapi E, Stefansson K and Hakonarson H (2004) Population genomics of drug response. American Journal of Pharmacogenomics 4: 73–82.

Hedges SB, Dudley J and Kumar S (2006) TimeTree: a public knowledge‐base of divergence times among organisms. Bioinformatics 22: 2971–2972.

Hein J (1990) Reconstructing evolution of sequences subject to recombination using parsimony. Mathematical Biosciences 98: 185–200.

Hirschhorn JN and Daly MJ (2005) Genome‐wide association studies for common diseases and complex traits. Nature Reviews. Genetics 6: 95–108.

International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437: 1299–1320.

Jukes TH and Cantor CR (1969) Evolution of protein molecules. In: Munro HM (ed.) Mammalian Protein Metabolism, vol III, pp 21–132. New York: Academic Press.

Kang SS, Wong PW and Susmano A (1991) Thermolabile methylenetetrahydrofolate reductase: an inherited risk factor for coronary artery disease. American Journal of Human Genetics 48: 536–545.

Kimura M (1968) Evolutionary rate at the molecular level. Nature 217: 624–626.

Kimura M (1970) The length of time required for a selectively neutral mutant to reach fixation through random frequency drift in a finite population. Genetic Research 15: 131–133.

Kimura M (1983) The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press.

Kryukov GV, Pennacchio LA and Sunyaev SR (2007) Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American Journal of Human Genetics 80: 727–739.

Kumar S, Filipski A, Swarna V, Walker A and Hedges SB (2005) Placing confidence limits on the molecular age of the human–chimpanzee divergence. Proceedings of the National Academy of Sciences of the USA 102: 18842–18847.

Lander ES (1996) The new genomics: global views of biology. Science 274: 536–539.

Lee CT, Risom T and Strauss WM (2007) Evolutionary conservation of microRNA regulatory circuits: an examination of microRNA gene complexity and conserved microRNA‐target interactions through metazoan phylogeny. DNA and Cell Biology 26: 209–218.

Levy S, Sutton G, Ng PC et al. (2007) The diploid genome sequence of an individual human. PLoS Biology 5: e254.

Lohmueller KE, Pearce CL, Pike M, Lander ES and Hirschhorn JN (2003) Meta‐analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genetics 33: 177–182.

Mahley RW and Rall Jr., SC (1999) Is epsilon4 the ancestral human apoE allele? Neurobiology of Aging 20: 429–430.

Marjoram P and Tavare S (2006) Modern computational approaches for analysing molecular genetic variation data. Nature Reviews Genetics 7: 759–770.

McCarroll SA and Altshuler DM (2007) Copy‐number variation and association studies of human disease. Nature Genetics 39: S37–S42.

McCarroll SA, Hadnott TN, Perry GH et al. (2006) Common deletion polymorphisms in the human genome. Nature Genetics 38: 86–92.

McKusick VA (2007) Mendelian inheritance in man and its online version, OMIM. American Journal of Human Genetics 80: 588–604.

McKusick VA and Amberger JS (1993) The morbid anatomy of the human genome: chromosomal location of mutations causing disease. Journal of Medical Genetics 30: 1–26.

Morris AP (2007) Coalescent methods for fine‐scale disease‐gene mapping. Methods in Molecular Biology 376: 123–140.

Morrow JA, Hatters DM and Lu B (2002) Apolipoprotein E4 forms a molten globule. A potential basis for its association with disease. The Journal of Biological Chemistry 277: 50380–50385.

Nei M (1996) Phylogenetic analysis in molecular evolutionary genetics. Annual Review of Genetics 30: 371–403.

Ng PC and Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Research 11: 863–874.

Ohta T (1973) Slightly deleterious mutant substitutions in evolution. Nature 246: 96–98.

Ou CY, Stevenson RE and Brown VK (1996) 5,10 methylenetetrahydrofolate reductase genetic polymorphism as a risk factor for neural tube defects. American Journal of Medical Genetics 63: 610–614.

Philippe H and Douady CJ (2003) Horizontal gene transfer and phylogenetics. Current Opinion in Microbiology 6: 498–505.

Pritchard JK (2001) Are rare variants responsible for susceptibility to complex diseases? American Journal of Human Genetics 69: 124–137.

Ray R, Schnoll RA and Lerman C (2007) Pharmacogenetics and smoking cessation with nicotine replacement therapy. CNS Drugs 21: 525–533.

Reich DE and Lander ES (2001) On the allelic spectrum of human disease. Trends in Genetics 17: 502–510.

Rhee H and Lee JS (2007) PADB: published association database. BMC Bioinformatics 8: 348.

Risch N (1990) Linkage strategies for genetically complex traits. I. Multilocus models. American Journal of Human Genetics 46: 222–228.

Roden DM, Altman RB, Benowitz NL et al. (2006) Pharmacogenomics: challenges and opportunities. Annals of Internal Medicine 145: 749–757.

Rosenberg NA and Nordborg M (2002) Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Reviews. Genetics 3: 380–390.

Roses AD (1997) A model for susceptibility polymorphisms for complex diseases: apolipoprotein E and Alzheimer disease. Neurogenetics 1: 3–11.

Sabeti PC, Varilly P, Fry B et al. (2007) Genome‐wide detection and characterization of positive selection in human populations. Nature 449: 913–918.

Sommer SS and Ketterling RP (1996) The factor IX gene as a model for analysis of human germline mutations: an update. Human Molecular Genetics 5(Spec No): 1505–1514.

Stenson PD, Ball EV, Mort M et al. (2003) Human Gene Mutation Database (HGMD): 2003 update. Human Mutation 21: 577–581.

Stone EA and Sidow A (2005) Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Research 15: 978–986.

Strittmatter WJ, Saunders AM, Schmechel D et al. (1993) Apolipoprotein E: high‐avidity binding to beta‐amyloid and increased frequency of type 4 allele in late‐onset familial Alzheimer disease. Proceedings of the National Academy of Sciences of the USA 90: 1977–1981.

Sunyaev S, Ramensky V, Koch I et al. (2001) Prediction of deleterious human alleles. Human Molecular Genetics 10: 591–597.

Templeton AR, Crandall KA and Sing CF (1992) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics 132: 619–633.

Templeton AR, Maxwell T, Posada D et al. (2005) Tree scanning: a method for using haplotype trees in phenotype/genotype association studies. Genetics 169: 441–453.

The International HapMap Consortium (2003) The International HapMap Project. Nature 426: 789–796.

Thomas PD and Kejariwal A (2004) Coding single‐nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. Proceedings of the National Academy of Sciences of the USA 101: 15398–15403.

Venkatesh B and Yap WH (2005) Comparative genomics using fugu: a tool for the identification of conserved vertebrate cis‐regulatory elements. BioEssays 27: 100–107.

Visel A, Bristow J and Pennacchio LA (2007) Enhancer identification through comparative genomics. Seminars in Cell & Developmental Biology 18: 140–152.

Wellcome Trust Case Control Consortium (2007) Genome‐wide association study of 14 000 cases of seven common diseases and 3000 shared controls. Nature 447: 661–678.

Wilke RA, Lin DW, Roden DM et al. (2007) Identifying genetic risk factors for serious adverse drug reactions: current progress and challenges. Nature Reviews. Drug Discovery 6: 904–916.

Williamson SH, Hubisz MJ, Clark AG et al. (2007) Localizing recent adaptive evolution in the human genome. PLoS Genetics 3: e90.

Wiuf C and Hein J (1999) The ancestry of a sample of sequences subject to recombination. Genetics 151: 1217–1228.

Wu Y and Gusfield D (2007) Efficient computation of minimum recombination with genotypes (not haplotypes). Journal of Bioinformatics and Computational Biology 5: 181–200.

Further Reading

Felsenstein J (2004) Inferring Phylogenies. New York: Sinauer, Inc.

Kimura M (1983) The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press.

Reich DE and Lander ES (2001) On the allelic spectrum of human disease. Trends in Genetics 17: 502–510.

Rosenberg NA and Nordborg M (2002) Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Reviews. Genetics 3: 380–390.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Thomas, Paul D(Jul 2008) Single Nucleotide Polymorphisms in Human Disease and Evolution: Phylogenies and Genealogies. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0020763]