Computational Methods in SNP Analysis


Genomic variations in the genome are the cause for various forms of diseases in human beings ranging from monogenic to complex ones. Among the 3.2 billion nucleotides that build our genome, there is only a 0.1% difference or variation between two randomly selected individuals. The simplest form of deoxyribonucleic acid (DNA) variation among individuals is the substitution of one single nucleotide for another at a homologous site in a population called single nucleotide polymorphism (SNP). Owing to increase in the number of SNPs in public databases, analysing the functional effects by classical genetics assessments remains a major challenge. To support this effort, a new branch of computational biology has emerged to differentiate the functional SNPs from neutral ones. In this point of view, special emphasis is placed on the existing computational methods in SNP analysis that are potentially most relevant to clinicians in large‐scale analysis.

Key Concepts:

  • Deoxyribonucleic acid (DNA) is the building block of life that carries the genetic information in living things. It has a double helix structure which consists of two complementary strands of nucleotides.

  • DNA consists of four bases: adenine (A), guanine (G), thymine (T) and cytosine (C).

  • Genomic variants consist of insertions, deletions, copy number variations (CNVs) and single nucleotide polymorphisms (SNPs).

  • An SNP in which change in amino acid substitution leads to the same polypeptide sequence is termed as synonymous SNP (csSNP).

  • An SNP in which change in the amino acid substitution leads to the different polypeptide sequence is termed as nonsynonymous SNP (nsSNP).

  • Nonsense mutations are mutations that result in a different amino acid.

  • Frameshift mutations are genetic mutations caused by insertions or deletions of a number of nucleotides in a DNA sequence.

  • Nonsense mutations are mutations that result in a premature stop codon.

  • Functional impacts of nsSNPs generally fall into two classes, namely disease‐associated (deleterious) and benign/neutral (no observable phenotypic effect).

Keywords: genomic variation; mutations; SNPs; nsSNPs; deleterious; neutral; computational methods

Figure 1.

Classification of SNPs.



Apweiler R, Bairoch A, Wu CH et al. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32: D115–D119.

Bao L and Cui Y (2006) Functional impacts of non‐synonymous single nucleotide polymorphisms: selective constraint and structural environments. FEBS Letters 580: 1231–1234.

Baudot A, Real F, Izarzugaza J and Valencia A (2009) From cancer genomes to cancer models: bridging the gaps. EMBO Reports 10: 359–366.

Berendsen HJC, Postma JPM, van Gunsteren WF, DiNola A and Haak JR (1984) Molecular dynamics with coupling to an external bath. Journal of Chemical Physics 81: 3684–3690.

Berman HM, Westbrook J, Feng Z et al. (2000) The Protein Data Bank. Nucleic Acids Research 28: 235–242.

Bernig T and Chanock SJ (2006) Challenges of SNP genotyping and genetic variation: its future role in diagnosis and treatment of cancer. Expert Review of Molecular Diagnostics 6: 319–331.

Boulling A, le Marechal C, Trouve P et al. (2007) Functional analysis of pancreatitis associated missense mutations in the pancreatic secretory trypsin inhibitor (SPINK1) gene. European Journal of Human Genetics 15: 936–942.

Brooks BR, Bruccoleri RE, Olafson BD et al. (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry 4: 187–217.

Brunak S, Engelbrecht J and Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. Journal of Molecular Biology 220: 49–65.

Cartegni and Krainer AR (2002) Disruption of an SF2/ASF‐dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1. Nature Genetics 30: 377–384.

Case DA, Pearlman DA, Caldwell JW, Wang J and Ross WS (2002) AMBER Simulation Software Package. San Francisco, CA: University of California.

Cline M and Karchin R (2011) Using bioinformatics to predict the functional impact of SNVs. Bioinformatics 27: 441–448.

De Cristofaro R, Carotti A, Akhavan S et al. (2006) The natural mutation by deletion of Lys9 in the thrombin A‐chain affects the pKa value of catalytic residues, the overall enzyme's stability and conformational transitions linked to Naþ binding. FEBS Journal 273: 159–169.

Dobson CM (2003) Protein folding and misfolding. Nature 426: 884–890.

Essmann U, Perera L, Berkowitz ML, Darden T and Lee H (1995) A smootparticle meshes Ewald method. Journal of Chemical Physics 103: 8577–8593.

Fujiwara H, Tatsumi KI, Tanaka S et al. (2000) A novel V59E missense mutation in the sodium iodide symporter gene in a family with iodide transport defect. Thyroid 10: 471–474.

Guex N and Peitsch MC (1997) SWISS‐MODEL and the Swiss‐PDBViewer: an environment for comparative protein modeling. Electrophoresis 18: 2714–2723.

Hansson T, Oostenbrink C and van Gunsteren W (2002) Molecular dynamics simulations. Current Opinion in Structural Biology 12: 190–196.

Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW and Klein ML (1983) Comparison of simple potential functions for simulating liquid water. Journal of Chemical Physics 79: 926–935.

Karchin R, Kelly L and Sali A (2005) Improving functional annotation of non‐synonymous SNPs with information theory. Pacific Symposium on Biocomputing 10: 397–408

Karplus M and McCammon JA (2002) Molecular dynamics simulations of biomolecules. Nature Structural Biology 9: 646–652.

Keage HA, Carare RO, Friedland RP et al. (2009) Population studies of sporadic cerebral amyloid angiopathy and dementia: a systematic review. BMC Neurology 9: 3.

Khan S and Vihinen M (2007) Spectrum of disease‐causing mutations in protein secondary structures. BMC Structural Biology 7: 1–18

Kim N, Alekseyenko AV, Roy M and Lee C (2007) The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acids Research 35: D93–D98.

Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16: 111–120

Kolchanov NA, Ignatieva EV, Ananko EA et al. (2002) Transcription regulatory regions database (TRRD): its status in 2002. Nucleic Acids Research 30: 312–317.

Koukouritaki SB, Poch MT, Henderson MC, Siddens LK and Krueger SK (2007) Identification and functional analysis of common human flavin‐containing monooxygenase 3 genetic variants. Journal of Pharmacology and Experimental Therapeutics 320: 266–273.

Lindahl E, Hess B and van der Spoel D (2001) GROMACS 3.0: a package for molecular simulations and trajectory analysis. Journal of Molecular Modeling 7: 306.

Matys V, Kel‐Margoulis OV, Fricke E et al. (2006) TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Research 34: D108–D110.

Monsuur AJ, de Bakker PI, Alizadeh BZ et al. (2005) Myosin IXB variant increases the risk of celiac disease and points toward a primary intestinal barrier defect. Nature Genetics 37: 1341–1344.

Murphy JA, Barrantes‐Reynolds R, Kocherlakota R, Bond JP and Greenblatt MS (2004) The CDKN2A database: integrating allelic variants with evolution, structure, function, and disease association. Human Mutation 24: 296–304.

Ozbabacan SEA, Gursoy A, Keskin O and Nussinov R (2010) Conformational ensembles, signal transduction and residue hot spots: application to drug discovery. Current Opinion in Drug Discovery and Development 13: 527–537.

Plenge RM, Padyukov L, Remmers EF et al. (2005) Replication of putative candidate‐gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. American Journal of Human Genetics 77: 1044–1060.

Prokunina L and Alarcon‐Riquelme ME (2002) Regulatory SNPs in complex diseases: their identification and functional validation. Expert Reviews in Molecular Medicine 6: 1–15.

Radivojac P, Baenziger PH, Kann MG et al. (2008) Gain and loss of phosphorylation sites in human cancer. Bioinformatics 24: 241–247.

Risch NJ (2000) Searching for genetic determinants in the new millennium. Nature 405: 847–856.

Sandelin A, Alkema W, Engstrom P, Wasserman WW and Lenhard B (2004) JASPAR: an open‐access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32: D91–D94.

Scott WRP, Huenenberger PH, Tironi IG et al. (1999) The GROMOS biomolecular simulation program package. Journal of Physical Chemistry A 103: 3596–3607.

Shastry BS (2002) SNP alleles in human disease and evolution. Journal of Human Genetics 47: 561–566.

Shen B, Bai J and Vihinen M (2008) Physicochemical feature‐based classification of amino acid mutations, Protein Engineering, Design and Selection 21: 37–44.

Sherry ST, Ward MH, Kholodov M et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29: 308–311.

Shirley BA, Stanssens P, Hahn U and Pace CN (1992) Contribution of hydrogen bonding to the conformational stability of ribonuclease T1. Biochemistry 31: 725–732.

Sunyaev S, Ramensky V, Koch I et al. (2001) Prediction of deleterious human alleles. Human Molecular Genetics 10: 591–597.

Teng SE, Michonova Alexova E and Alexov (2008) Approaches and resources for prediction of the effects of non‐synonymous single nucleotide polymorphism on protein function and interactions. Current Pharmaceutical Biotechnology 9: 123–133.

Tomalik‐Scharte D, Lazar A, Fuhr U and Kirchheiner J (2008) The clinical role of genetic polymorphisms in drug‐metabolizing enzymes. Pharmacogenomics 8: 4–15.

Venkatesan RN, Treuting PM, Fuller ED et al. (2007) Mutation at the polymerase active site of mouse DNA polymerase delta increases genomic instability and accelerates tumorigenesis. Molecular and Cellular Biology 27: 7669–7682.

Wang Z and Moult J (2001) SNPs, protein structure, and disease. Human Mutation 7: 263–270.

Weiner PW and Kollman PA (1981) AMBER: assisted model building with energy refinement. A general program for modelling molecules and their interactions. Journal of Computational Chemistry 2: 287–303.

Young MA, Gonfloni S, Superti‐Furga G, Roux B and Kuriyan J (2001) Dynamic coupling between the SH2 and SH3 domains of c‐Src and Hck underlies their inactivation by C terminal tyrosine phosphorylation. Cell 105: 115–126.

Further Reading

Barnes MR and Breen G (2010) Genetic variation: methods and protocols. Methods in Molecular Biology (Book 628), 366 pages. Totowa, NJ: Humana Press.

George Priya Doss C, Sudandiradoss C, Rajasekaran R et al. (2008) Application of computational algorithm tools to identify functional SNPs. Functional and Integrative Genomics 8: 309–316.

Karchin R (2009) Next generation tools for the annotation of human SNPs. Briefings in Bioinformatics 10: 35–52.

Pui‐Yan K (2003) Single Nucleotide Polymorphisms. Methods in Molecular Biology, vol. 212. Totowa, NJ: Humana Press.

Mooney S (2005) Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Briefings in Bioinformatics 6: 44–56.

Sethumadhavan R, Doss CG and Rajasekaran R (2011) In silico searching for diseaseassociated functional DNA variants. Methods in Molecular Biology 760: 239–250. Humana Press.

Yu B and Hinchcliffe M (eds) (2011) In Silico Tools for Gene Discovery, XI, 365 pp. Totowa, NJ: Humana Press.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
George Priya Doss, C(May 2014) Computational Methods in SNP Analysis. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0025492]