Parametric and Nonparametric Linkage Analysis

Abstract

Parametric or model‐based linkage analysis assumes that models describing both the trait and genetic marker loci are known without error, although sensitivity analysis approaches allow one to account for uncertainty in the trait model. Nonparametric, model‐free or weakly parametric linkage methods make fewer assumptions about the trait model. Family‐based linkage analysis in general is most powerful to detect genetic variants that have large effects on risk of a disease or variation of a quantitative trait. These variants tend to be rare in the population for any disease which is likely to have undergone negative selection over evolutionary time. Linkage studies are more powerful than genome‐wide association studies (GWAS) to detect genes with rare, high penetrance risk variants, whereas GWAS are more powerful to detect common risk variants which tend to have small individual effects on risk. Thus, both linkage and association analysis strategies are useful in the era of whole‐genome sequencing. Both parametric and nonparametric linkage methods are useful for detecting regions of the genome harbouring high penetrance risk variants which are of particular interest for precision medicine.

Key Concepts

  • Linkage analysis evaluates the likelihood that a specific allele or haplotype of alleles co‐segregates with a disease or trait in a family or group of families.
  • Linkage analysis searches for evidence of a violation of Mendel's Law of Independent Assortment of alleles at two loci to the offspring.
  • Linkage is due to the fact that long stretches of DNA on a specific chromosome are transmitted together to an offspring.
  • Genetic loci on different chromosomes are not linked and do segregate independently to offspring.
  • Genetic loci on the same chromosome that are close together show some evidence of linkage, but loci that are far apart on the same chromosome may not show evidence of linkage due to the fact that chiasmata occur at the four‐strand stage of meiosis, which results in recombination between the chromatids of the parental pair of chromosomes.
  • Linkage is a measure of the probability of recombination between two loci (such as a risk allele for a disease and a genetic variant somewhere in the genome) given the observed genotypes of the parents and offspring.
  • In parametric linkage, a model is assumed that specifies the mode of inheritance, allele frequency and penetrance of all genotypes for the disease locus (probability of a given value of a quantitative trait) plus the mode of inheritance, allele frequency and probability of observing each laboratory phenotype given genotype for the genotyped marker loci (usually codominant and 100% for modern single‐nucleotide polymorphism and DNA sequencing genotypes).
  • In nonparametric linkage, no specific assumptions are made about the trait model, but the genotyped marker locus model must be fully described as above for parametric linkage.
  • The power of these two approaches depends on many different factors, but both depend on the observation of large numbers of meioses; so, large families with multiple affected individuals (or a wide range of quantitative trait values) in which all individuals have both phenotype and genotype data are most powerful.
  • Linkage analyses are particularly useful for detecting genetic variants with large effects on risk of disease or large effects on the variance of a quantitative trait, which are of particular utility for precision medicine.

Keywords: linkage; genetics; power; type 1 error; heterogeneity; model based; bias

Figure 1. Example of extended pedigrees exhibiting linkage between a disease locus (D) and four marker loci in a small region of a chromosome. The diagram illustrates the scenario where common variants can be linked along a haplotype within families even though a different disease allele (and different linked haplotype) is present in each family. Here, we assume that the disease causing gene, D, is located between the second and third marker locus on the diagram and that the actual disease variants, D1, D2 and D3 in the first, second and third families, respectively, are rare, have not been genotyped and are highly penetrant. In this example, we also assume that only carriers of a risk allele are affected with the disease. Markers A, B, C and D are assumed to have common minor allele frequency (MAF) and have been genotyped. All four markers are also located very close to each other on the same chromosome, so that it is unlikely any crossing over will occur within families in this small number of observed meioses. Looking within each family, it is clear that all markers are linked to the disease as in each family the black haplotype co‐segregates with the disease phenotype. Parametric linkage evaluates the likelihood of linkage at a particular recombination fraction between the unknown disease gene and the marker loci based on co‐segregation patterns within the family and on the assumed genetic models for the disease and marker loci. Affected‐pair nonparametric linkage evaluates whether affected relative pairs share more alleles identical by descent than expected based on their relationship.
Figure 2. Example of one form of nonparametric linkage analysis of a quantitative trait in sibling pairs. Regression of the proportion of marker alleles shared identical by descent (i.b.d.) versus the squared difference in the quantitative phenotypes for each sibling pair is shown here. Sibs 1 and 2 have phenotypes y1 and y2. If there is no linkage between the marker locus and a gene that affects variation of the quantitative trait, then the regression line should be flat, and the regression coefficient should be zero. If linkage is present, then the regression coefficient should be significantly different from zero as sibling pairs that share a higher proportion of alleles i.b.d. at a linked marker locus should also share a higher proportion of alleles i.b.d. at the trait locus and thus should have more similar quantitative phenotypes than would be observed in pairs who exhibit lower proportions of alleles shared i.b.d.
close

References

Abreu PC, Greenberg DA and Hodge SE (1999) Direct power comparisons between simple LOD scores and NPL scores for linkage analysis in complex diseases. American Journal of Human Genetics 65: 847–857.

Adrianto I and Montgomery C (2017) Estimating allele frequencies. Methods in Molecular Biology 1666: 61–82.

Albers CA, Stankovich J, Thomson R, Bahlo M and Kappens HJ (2008) Multipoint approximations of identity‐by‐decent probabilities for accurate linkage analysis of distantly related individuals. American Journal of Human Genetics 82: 607–622.

Amos CI (1988) Robust Methods for the Detection of Genetic Linkage for Data from Extended Families and Pedigrees. PhD dissertation, Louisiana State University Medical Center, New Orleans.

Amos CI and Williamson JA (1993) Robustness of the maximum‐likelihood (LOD) method for detecting linkage. American Journal of Human Genetics 52: 213–214.

Bailey‐Wilson JE and Wilson AF (2011) Linkage analysis in the next‐generation sequencing era. Human Heredity 72 (4): 228–236. Epub 23 Dec 2011.

Barrett JH and Teare MD (2011) Linkage Analysis. In: Yu B and Hinchcliffe M (eds) In Silico Tools for Gene Discovery. Methods in Molecular Biology (Methods and Protocols), vol. 760. New york: Humana Press. DOI: 10.1007/978-1-61779-176-5_2.

Basu S, Stephens M, Pankow JS and Thompson EA (2010) A likelihood‐based trait‐model‐free approach for linkage detection of a binary trait. Biometrics 66: 205–213.

Brugger M and Strauch K (2014) Fast linkage analysis with MOD scores using algebraic calculation. Human Heredity 78: 179–194.

Brugger M, Rospleszcz S and Strauch K (2017) Estimation of trait‐model parameters in a MOD score linkage analysis. Human Heredity 82: 103–139.

Cantor RM (2017) Model‐based linkage analysis of a binary trait. Methods in Molecular Biology 1666: 311–326. DOI: 10.1007/978-1-4939-7274-6_15.

Chen WM, Broman KW and Liang KY (2004) Quantitative trait linkage analysis by generalized estimating equations: unification of variance components and Haseman–Elston regression. Genetic Epidemiology 26 (4): 265–272.

Clerget‐Darpoux FC, Bonaiti‐Pellie C and Hochez J (1986) Effects of misspecifying genetic parameters in LOD score analysis. Biometrics 42: 393–399.

Day‐Williams AG, Blangero J, Dyer TD, Lange K and Sobel EM (2011) Unifying ideas for non‐parametric linkage analysis. Human Heredity 7 (4): 267–280.

Edwards AWF (1992) Likelihood. Baltimore, MD: Johns Hopkins University Press.

Elston RC, Buxbaum S, Jacobs KB and Olson JM (2000) Haseman and Elston revisited. Genetic Epidemiology 19 (1): 1–17.

Göring H and Terwilliger J (2000a) Linkage analysis in the presence of errors. I. Complex‐valued recombination fractions and complex phenotypes. American Journal of Human Genetics 66: 1095–1106.

Göring H and Terwilliger J (2000b) Linkage analysis in the presence of errors. IV. Joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. American Journal of Human Genetics 66: 1310–1327.

Haseman JK and Elston RC (1972) The investigation of linkage between a quantitative trait and a marker locus. Behavior Genetics 2: 3–19.

Holmans P (1993) Asymptotic properties of affected‐sib‐pair linkage analysis. American Journal of Human Genetics 52: 362–374.

Huang J and Vieland V (2001) Comparison of ‘model‐free’ and ‘model‐based’ linkage statistics in the presence of locus heterogeneity: single data set and multiple data set applications. Human Heredity 51: 217–225.

Khoury MJ, Beaty TH and Cohen BH (1993) Fundamentals of genetic epidemiology. New York, NY: Oxford University Press.

Kim MK, Hong YJ and Song HH (2006) Nonparametric trend statistic incorporating dispersion differences in sib pair linkage for quantitative traits. Human Heredity 62 (1): 1–11. Epub 5 Sep 2006.

Kong A and Cox N (1997) Allele‐sharing models: LOD scores and accurate linkage tests. American Journal of Human Genetics 61: 1179–1188.

Knapp M, Seuchter SA and Baur MP (1994) Linkage analysis in nuclear families. II. Relationship between affected sib‐pair tests and LOD score analysis. Human Heredity 44: 44–51.

Künzel T and Strauch K (2012) Parameter estimation and quantitative parametric linkage analysis with GENEHUNTER‐QMOD. Human Heredity 73 (4): 208–219. DOI: 10.1159/000339904. Epub 19 Aug 2012.

Lange E and Lange K (2004) Powerful allele sharing statistics for nonparametric linkage analysis. Human Heredity 57: 49–58.

Lee SS, Sun L, Kustra R and Bull SB (2008) EM‐random forest and new measures of variable importance for multi‐locus quantitative trait linkage analysis. Bioinformatics 24 (14): 1603–1610. DOI: 10.1093/bioinformatics/btn239. Epub 21 May 2008.

Lee W, Kim J, Lee Y, Park T and Suh YJ (2015) A hierarchical generalized linear model in combination with dispersion modeling to improve sib‐pair linkage analysis. Human Heredity 80 (1): 12–20. DOI: 10.1159/000433467. Epub 30 Jul 2015. Erratum in: Hum Hered. 2015; 80(2): 100.

Mandal DM, Wilson AF, Keats BJB and Bailey‐Wilson JE (1998) Factors affecting inflation of type 1 error of model‐based linkage under random ascertainment. American Journal of Human Genetics 63: A298.

Mandal DM, Wilson AF, Elston RC, et al. (1999) Effects of misspecification of allele frequencies on the type 1 error rate of model‐free linkage analysis. Human Heredity 50: 126–132.

Mandal DM, Wilson AF and Bailey‐Wilson JE (2001) Effects of misspecification of allele frequencies on the power of Haseman–Elston sib‐pair linkage method for quantitative traits. American Journal of Medical Genetics 103: 308–313.

McPeek MS (1999) Optimal allele‐sharing statistics for genetic mapping using affected relatives. Genetic Epidemiology 16 (3): 225–249.

Morris NJ and Stein CM (2017) Model‐free linkage analysis of a quantitative trait. Methods in Molecular Biology 1666: 327–342. DOI: 10.1007/978-1-4939-7274-6_16.

Nsengimana J and Bishop DT (2017) Design considerations for genetic linkage and association studies. Methods in Molecular Biology 1666: 257–281.

O'Connell J and Yao Y (2017) Identification of genotype errors. Methods in Molecular Biology 1666: 11–23. DOI: 10.1007/978-1-4939-7274-6_2.

Ott J (1999) Analysis of Human Genetic Linkage. Baltimore, MD: Johns Hopkins University Press.

Ott J, Wang J and Leal SM (2015) Genetic linkage analysis in the age of whole‐genome sequencing. Nature Reviews. Genetics 16 (5): 275–284. Epub 31 Mar 2015.

Palmer LJ, Jacobs KB and Elston RC (2000) Haseman and Elston revisited: the effects of ascertainment and residual familial correlations on power to detect linkage. Genetic Epidemiology 19: 456–460.

Plancoulaine S, Alcaïs A, Chen Y, Abel L and Gagnon F (2005) Inclusion of unaffected sibs increases power in model‐free linkage analysis of a behavioral trait. BMC Genetics 6 (Suppl 1): S22. DOI: 10.1186/1471-2156-6-S1-S22.

Risch N (1990) Linkage strategies for genetically complex traits. III. The effect of marker polymorphism on analysis of affected relative pairs. American Journal of Human Genetics 46: 242–253.

Sham PC, Lin M‐W, Zhao JH and Curtis D (2000) Power comparison of parametric and nonparametric linkage tests in small pedigrees. American Journal of Human Genetics 66: 1661–1668.

Song YE, Song S and Schnell AH (2017) Model‐based linkage analysis of a quantitative trait. Methods in Molecular Biology 1666: 283–310. DOI: 10.1007/978-1-4939-7274-6_14.

Sun L (2017) Detecting pedigree relationship errors. Methods in Molecular Biology 1666: 25–44. DOI: 10.1007/978-1-4939-7274-6_3.

Teare MD (2011) Approaches to genetic linkage analysis. In: Teare M (ed) Genetic Epidemiology. Methods in Molecular Biology (Methods and Protocols), vol. 713. Totowa, NJ: Humana Press. DOI: 10.1007/978-1-60327-416-6_5.

Thompson EA (2013) Identity by descent: variation in meiosis, across genomes, and in populations. Genetics 194: 301–326.

Vieland V and Logue M (2002) HLODs, trait models, and ascertainment: implications of admixture for parameter estimation and linkage detection. Human Heredity 53: 23–35.

Wang T and Elston RC (2001) Regression‐based linkage analysis methods. In: Deng H, Shen H, Liu Y and Hu H (eds.) Current Topics in Human Genetics: Studies in Complex Diseases, New Jersey: World Scientific. pp. 1–20.

Wang T and Elston RC (2004) A modified revisited Haseman–Elston method to further improve power. Human Heredity 57 (2): 109–116.

Wang T and Elston RC (2005) Two‐level Haseman–Elston regression for general pedigree data analysis. Genetic Epidemiology 29 (1): 12–22.

Wang GT, Zhang D, Li B, Dai H and Leal SM (2015) Collapsed haplotype pattern method for linkage analysis of next‐generation sequence data. European Journal of Human Genetics 23: 1739–1743.

Wang J and Shete S (2017) Testing departure from Hardy–Weinberg proportions. Methods in Molecular Biology 1666: 83–116.

Weeks DE and Lange K (1988) The affected‐pedigree‐member method of linkage analysis. American Journal of Human Genetics 42: 315–326.

Whittemore A and Halpern J (1994a) A class of tests for linkage using affected pedigree members. Biometrics 50: 118–127.

Whittemore A and Halpern J (1994b) Probability of gene identity by descent: computation and applications. Biometrics 50: 109–117.

Wiener H, Elston RC and Tiwari HK (2003) X‐linked extension of the revised Haseman–Elston algorithm for linkage analysis in sib pairs. Human Heredity 55 (2‐3): 97–107.

Wijsman E, Rothstein JH and Thompson EA (2006) Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain‐Monte Carlo provides practical approaches for genome scans on general pedigrees. American Journal of Human Genetics 79: 846–858.

Williamson JA and Amos CI (1990) On the asymptotic behavior of the estimate of the recombination fraction under the null hypothesis of no linkage when the model is misspecified. Genetic Epidemiology 7: 309–318.

Won S, Elston RC and Park T (2006) Extension of the Haseman–Elston regression model to longitudinal data. Human Heredity 61 (2): 111–119. Epub 30 May 2006.

Xu W, Ma J, Greenwood CMT, Paterson AD and Bull SB (2017) Model‐free linkage analysis of a binary trait. Methods in Molecular Biology 1666: 343–373. DOI: 10.1007/978-1-4939-7274-6_17.

Further Reading

Almasy L and Warren DM (2005) Software for quantitative trait analysis. Human Genomics 2 (3): 191–195.

Elston RC, Olson JM and Palmer L (eds) (2002) Biostatistical Genetics and Genetic Epidemiology, 1st edn. West Sussex: John Wiley & Sons, Ltd.

Elston R (2017) Statistical Human Genetics: Methods and Protocols (Methods in Molecular Biology), 2nd edn. New York: Humana Press.

Liang K‐Y and Self SG (1996) On the asymptotic behaviour of the pseudolikelihood ratio test statistic. Journal of the Royal Statistical Society 58 (Series B (Methodological)): 785–796.

Olson JM, Witte JS and Elston RC (1999) Tutorial in biostatistics: genetic mapping of complex traits. Statistics in Medicine 18: 2961–2981.

Ott J (1999) Analysis of Human Genetic Linkage. Baltimore, MD: Johns Hopkins University Press.

Rao DC and Gu CC (eds) (2008) Genetic Dissection of Complex Traits, 2nd edn, vol. 60 (Advances in Genetics). London: Academic Press.

Whittemore AS (1996) Genome scanning for linkage: an overview. American Journal of Human Genetics 59: 704–716.

Wijsman EM (2012) The role of large pedigrees in an era of high‐throughput sequencing. Human Genetics 131 (10): 1555–1563. DOI: 10.1007/s00439-012-1190-2. Epub 20 June 2012.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Bailey‐Wilson, Joan E(Jun 2018) Parametric and Nonparametric Linkage Analysis. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005403.pub2]