Analysis of Gene–Gene Interactions Underlying Human Disease


Following the identification of disease‐susceptibility variants in genome‐wide association studies by using the standard single‐locus analyses, the discovery process is shifting towards gene–gene interactions of functional importance in the pathophysiology and aetiology of complex diseases. The results from these gene–gene interaction analyses could lead to new genetic findings that account for the heritability of human diseases as well as novel insights about underlying genetic aetiology through later bench science research and clinical applications. To facilitate gene–gene interaction analyses, various statistical methods have been proposed, each of which is applicable for certain study designs and has its own advantages under certain conditions. In this article, the authors provide a survey of the statistical methods and software packages that are currently available for population‐based and family‐based gene–gene interaction studies. The strength of each method is discussed and the difficulties in determining the relationship between biological and statistical interactions are laid out.

Key Concepts:

  • A biological interaction describes a scenario in which two or more genes jointly affect a disease.

  • A statistical interaction describes the nonadditive effect in generalised linear models.

  • The heritability of a phenotype is defined as the proportion of phenotypic variations between individuals due to their genetic differences.

  • Population based case‐control study recruits individuals with a disease of interest along with the unrelated healthy individuals, and compares the allele/genotype distributions between cases and controls to determine whether a statistical interaction exists.

  • Family based study design avoids the potential confounding effect due to population stratification and admixture by recruiting the parents and/or siblings of the cases.

Keywords: epistasis; study design; population‐based study; family‐based study; regression‐based methods; data mining approaches; high‐dimensional data; high‐order interactions


Ahlbom A and Alfredsson L (2005) Interaction: a word with two meanings creates confusion. European Journal of Epidemiology 20(7): 563–564.

Andrieu N , Dondon MG and Goldstein AM (2005) Increased power to detect gene–environment interaction using siblings controls. Annals of Epidemiology 15(9): 705–711.

Bateson W and Mendel G (1909) Mendel's Principles of Heredity. Cambridge: Cambridge University Press.

Breiman L (2001) Random forest. Machine Learning 45: 5–32.

Chen GB , Xu Y , Xu HM et al. (2011) Practical and theoretical considerations in study design for detecting gene–gene interactions using MDR and GMDR approaches. PLoS One 6(2): e16981.

Cordell HJ (2002) Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Human Molecular Genetics 11(20): 2463–2468.

Cordell HJ (2009) Detecting gene–gene interactions that underlie human diseases. Nature Reviews Genetics 10(6): 392–404.

Cordell HJ and Clayton DG (2002) A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. American Journal of Human Genetics 70(1): 124–141.

Culverhouse R , Suarez BK , Lin J and Reich T (2002) A perspective on epistasis: limits of models displaying no main effect. American Journal of Human Genetics 70(2): 461–471.

Dahinden C , Parmigiani G , Emerick MC and Buhlmann P (2007) Penalized likelihood for sparse contingency tables with an application to full‐length cDNA libraries. BMC Bioinformatics 8: 476.

Dinu I , Mahasirimongkol S , Liu Q et al. (2012) SNP–SNP interactions discovered by logic regression explain Crohn's disease genetics. PLoS One 7(10): e43035.

Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Philosophical Transactions of the Royal Society of Edinburgh 52: 399–433.

Fisher RA (1932) Statistical Methods for Research Workers. London: Oliver & Boyd.

Garcia‐Closas M and Lubin JH (1999) Power and sample size calculations in case‐control studies of gene–environment interactions: comments on different approaches. American Journal of Epidemiology 149(8): 689–692.

Gauderman WJ (2002) Sample size requirements for association studies of gene–gene interaction. American Journal of Epidemiology 155(5): 478–484.

Horvath S and Laird NM (1998) A discordant‐sibship test for disequilibrium and linkage: no need for parental data. American Journal of Human Genetics 63(6): 1886–1897.

Kooperberg C and Ruczinski I (2005) Identifying interacting SNPs using Monte Carlo logic regression. Genetic Epidemiology 28(2): 157–170.

Kotti S , Bickeboller H and Clerget‐Darpoux F (2007) Strategy for detecting susceptibility genes with weak or no marginal effect. Human Heredity 63(2): 85–92.

Laird NM , Horvath S and Xu X (2000) Implementing a unified approach to family‐based tests of association. Genetic Epidemiology 19(Suppl. 1): S36–S42.

Laird NM and Lange C (2006) Family‐based designs in the age of large‐scale gene‐association studies. Nature Reviews Genetics 7(5): 385–394.

Lou XY , Chen GB , Yan L et al. (2007) A generalized combinatorial approach for detecting gene‐by‐gene and gene‐by‐environment interactions with application to nicotine dependence. American Journal of Human Genetics 80(6): 1125–1137.

Lou XY , Chen GB , Yan L et al. (2008) A combinatorial approach to detecting gene–gene and gene–environment interactions in family studies. American Journal of Human Genetics 83(4): 457–467.

Lu Q , Wei C , Ye C , Li M and Elston RC (2012) A likelihood ratio‐based Mann–Whitney approach finds novel replicable joint gene action for type 2 diabetes. Genetic Epidemiology 36(6): 583–593.

Lunn DJ , Whittaker JC and Best N (2006) A Bayesian toolkit for genetic association studies. Genetic Epidemiology 30(3): 231–247.

Ma L , Clark AG and Keinan A (2013) Gene‐based testing of interactions in association studies of quantitative traits. PLoS Genetics 9(2): e1003321.

Manolio TA , Collins FS , Cox NJ et al. (2009) Finding the missing heritability of complex diseases. Nature 461(7265): 747–753.

Martin ER and Kaplan NL (2000) A Monte Carlo procedure for two‐stage tests with correlated data. Genetic Epidemiology 18(1): 48–62.

Martin ER , Ritchie MD , Hahn L , Kang S and Moore JH (2006) A novel method to identify gene–gene effects in nuclear families: the MDR‐PDT. Genetic Epidemiology 30(2): 111–123.

McKinney BA , Reif DM , Ritchie MD and Moore JH (2006) Machine learning for detecting gene–gene interactions: a review. Applied Bioinformatics 5(2): 77–88.

Moore JH (2003) The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity 56(1–3): 73–82.

Moore JH and Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 27(6): 637–646.

Nejentsev S , Walker N , Riches D , Egholm M and Todd JA (2009) Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324(5925): 387–389.

Park MY and Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9(1): 30–50.

Purcell S , Neale B , Todd‐Brown K et al. (2007) PLINK: a tool set for whole‐genome association and population‐based linkage analyses. American Journal of Human Genetics 81(3): 559–575.

Rakovski CS , Xu X , Lazarus R , Blacker D and Laird NM (2007) A new multimarker test for family‐based association studies. Genetic Epidemiology 31(1): 9–17.

Ritchie MD , Hahn LW , Roodi N et al. (2001) Multifactor‐dimensionality reduction reveals high‐order interactions among estrogen‐metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69(1): 138–147.

Ritchie MD , White BC , Parker JS , Hahn LW and Moore JH (2003) Optimization of neural network architecture using genetic programming improves detection and modeling of gene–gene interactions in studies of human diseases. BMC Bioinformatics 4: 28.

Ruczinski I , Kooperberg C and LeBlanc ML (2003) Logic regression. Journal of Computational and Graphical Statistics 12(3): 475–511.

Schwarz DF , Konig IR and Ziegler A (2010) On safari to Random Jungle: a fast implementation of Random Forests for high‐dimensional data. Bioinformatics 26(14): 1752–1758.

Spielman RS and Ewens WJ (1998) A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. American Journal of Human Genetics 62(2): 450–458.

Spielman RS , McGinnis RE and Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin‐dependent diabetes mellitus (IDDM). American Journal of Human Genetics 52(3): 506–516.

Visscher PM (2008) Sizing up human height variation. Nature Genetics 40(5): 489–490.

Wan X , Yang C , Yang Q et al. (2010) BOOST: a fast approach to detecting gene–gene interactions in genome‐wide case‐control studies. American Journal of Human Genetics 87(3): 325–340.

Wang S and Zhao H (2003) Sample size needed to detect gene–gene interactions using association designs. American Journal of Epidemiology 158(9): 899–914.

Wang X , Elston RC and Zhu X (2011) Statistical interaction in human genetics: how should we model it if we are looking for biological interaction? Nature Reviews Genetics 12(1): 74.

Wei C , Schaid DJ and Lu Q (2013) Trees assembling Mann–Whitney approach for detecting genome‐wide joint association among low‐marginal‐effect loci. Genetic Epidemiology 37(1): 84–91.

Wen Y , Schaid DJ and Lu Q (2013) A bivariate Mann–Whitney approach for unraveling genetic variants and interactions contributing to comorbidity. Genetic Epidemiology 37(3): 248–255.

Yu Z (2011) Testing gene–gene interactions in the case‐parents design. Human Heredity 71(3): 171–179.

Zhang Y and Liu JS (2007) Bayesian inference of epistatic interactions in case‐control studies. Nature Genetics 39(9): 1167–1173.

Further Reading

Cirulli ET and Goldstein DB (2010) Uncovering the roles of rare variants in common disease through whole‐genome sequencing. Nature Reviews Genetics 11(6): 415–425.

Gelman A , Carlin JB , Stern HS and Rubin DB (1995) Bayesian Data Analysis. London: Chapman and Hall.

Gilks WR , Richardson S and Spiegelhalter DJ (1996) Markov Chain Monte Carlo in Practice. London: Chapman and Hall.

Hancock DB , Martin ER , Li YJ and Scott WK (2007) Methods for interaction analyses using family‐based case‐control data: conditional logistic regression versus generalized estimating equations. Genetic Epidemiology 31(8): 883–893.

Pao Y (1988) Adaptive Pattern Recognition and Neural Networks. New York: Addison Wesley.

Peduzzi P , Concato J , Kemper E , Holford TR and Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology 49(12): 1373–1379.

Ruczinski I , Kooperberg C and LeBlanc ML (2004) Exploring interactions in high dimensional genomic data: an overview of logic regression, with applications. Journal of Multivariate Analysis 90: 178–195.

Wang X , Elston RC and Zhu X (2010) The meaning of interaction. Human Heredity 70(4): 269–277.

Yang Q , Khoury MJ , Sun F and Flanders WD (1999) Case‐only design to measure gene–gene interaction. Epidemiology 10(2): 167–170.

Zeger SL , Liang KY and Albert PS (1988) Models for longitudinal data: a generalized estimating equation approach. Biometrics 44(4): 1049–1060.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Wen, Yalu, and Lu, Qing(Jan 2014) Analysis of Gene–Gene Interactions Underlying Human Disease. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0022498]