Genome‐Wide Association Studies in Plants

Abstract

Cheap genome sequencing technology has made it possible to search for genomic variants called single nucleotide polymorphisms (SNPs) for hundreds of individuals. Linking these genomic variants to phenotypes is the main goal in running genome‐wide association studies (GWAS). SNPs can be discovered and called using different technologies and methods, and subsequent quality control must be performed taking into account the species of study and genotyping techniques. GWAS can be performed using different mathematical approaches, demonstrated within the current range of software packages, which are used to perform the GWAS and interpret the subsequent results.

Key Concepts

  • GWAS is a powerful tool to associate genomic variants with phenotypes.
  • Quality control is core to any good GWAS.
  • Numerous powerful tools now exist to make running a GWAS straightforward.
  • Interpreting the output is still a challenge, especially in the presence of hidden confounding factors.
  • GWAS can be performed using different types of genotyping data, each with its own advantages and disadvantages.

Keywords: genome‐wide association studies (GWAS); generalised linear models (GLMs); mixed linear models (MLM); Brassica ; wheat

Figure 1. Example of a well‐adjusted Manhattan plot, with a few identified SNPs in red above the cut‐off line.
Figure 2. Example of a GWAS where the majority of SNPs are spuriously linked with the phenotype.
close

References

Andrews KR , Good JM , Miller MR , Luikart G and Hohenlohe PA (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nature Reviews. Genetics 17 (2): 81–92. DOI: 10.1038/nrg.2015.28.

Arora S , Steuernagel B , Chandramohan S , et al. (2018) Resistance gene discovery and cloning by sequence capture and association genetics. bioRxiv: 248146.

Baird NA , Etter PD , Atwood TS , et al. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3: e3376.

Bayer PE , Ruperao P , Mason AS , et al. (2015) High‐resolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus. TAG. Theoretical and Applied genetics. Theoretische und angewandte Genetik 128: 1039–1047.

Brachi B , Morris GP and Borevitz JO (2011) Genome‐wide association studies in plants: the missing heritability is in the field. Genome Biology 12: 232.

Bradbury PJ , Zhang Z , Kroon DE , et al. (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23 (19): 2633–2635. DOI: 10.1093/bioinformatics/btm308.

Cao K , Zhou Z , Wang Q , et al. (2016) Genome‐wide association study of 12 agronomic traits in peach. Nature Communications 7: 13246.

Chang CC , Chow CC , Tellier LC , et al. (2015) Second‐generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4: 7. DOI: 10.1186/s13742-015-0047-8.

Clevenger JP and Ozias‐Akins P (2015) SWEEP: a tool for filtering high‐quality SNPs in polyploid crops. G3 5: 1797–1803.

Dabney A , Storey JD and Warnes G (2010) qvalue: Q‐value estimation for false discovery rate control. R package version 1.

Fadista J , Manning AK , Florez JC and Groop L (2016) The (in)famous GWAS P‐value threshold revisited and updated for low‐frequency variants. European Journal of Human Genetics 24: 1202–1205.

Gabur I , Chawla HS , Liu X , et al. (2018) Finding invisible quantitative trait loci with missing data. Plant Biotechnology Journal 16: 2102–2112.

Garrison E and Marth G (2012) Haplotype‐based variant detection from short‐read sequencing. arXiv preprint, arXiv:1207.3907.

Gao X , Becker LC , Becker DM , Starmer JD and Province MA (2010) Avoiding the high Bonferroni penalty in genome‐wide association studies. Genetic Epidemiology 34: 100–105.

Grimm DG , Roqueiro D , Salome PA , et al. (2017) easyGWAS: a cloud‐based platform for comparing the results of genome‐wide association studies. The Plant Cell 29: 5–19.

Lawson DJ , van Dorp L and Falush D (2018) A tutorial on how not to over‐interpret STRUCTURE and ADMIXTURE bar plots. Nature Communications 9: 3258.

Lees JA , Vehkala M , Valimaki N , et al. (2016) Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nature Communications 7: 12797.

Lees J , Galardini M , Bentley SD , Weiser JN and Corander J (2018) pyseer: a comprehensive tool for microbial pangenome‐wide association studies. bioRxiv: 266312.

Li H , Handsaker B , Wysoker A , et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 (16): 2078–2079.

Lipka AE , Tian F , Wang Q , et al. (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28 (18): 2397–2399. DOI: 10.1093/bioinformatics/bts444.

Lorenc MT , Hayashi S , Stiller J , et al. (2012) Discovery of single nucleotide polymorphisms in complex genomes using SGSautoSNP. Biology 1: 370–382.

Ng SB , Turner EH , Robertson PD , et al. (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461 (7261): 272–276. DOI: 10.1038/nature08250.

Ma J and Amos CI (2012) Principal components analysis of population admixture. PLoS One 7: e40115.

Novembre J , Johnson T , Bryc K , et al. (2008) Genes mirror geography within Europe. Nature 456: 98–101.

Peloso GM and Lunetta KL (2011) Choice of population structure informative principal components for adjustment in a case–control study. BMC Genetics 12: 64.

Peterson BK , Weber JN , Kay EH , Fisher HS and Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non‐model species. PLoS One 7.

Price AL , Patterson NJ , Plenge RM , et al. (2006) Principal components analysis corrects for stratification in genome‐wide association studies. Nature Genetics 38: 904–909.

Pritchard JK , Stephens M and Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945–959.

Purcell S , Neale B , Todd‐Brown K , et al. (2007) PLINK: a tool set for whole‐genome association and population‐based linkage analyses. American Journal of Human Genetics 81 (3): 559–575. DOI: 10.1086/519795.

Quinlan AR and Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 (6): 841–842. DOI: 10.1093/bioinformatics/btq033.

Rahman A , Hallgrímsdóttir I , Eisen M and Pachter L (2018) Association mapping from sequencing reads using k‐mers. eLife 7: e32920.

Raj A , Stephens M and Pritchard JK (2014) fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197: 573–589.

Raman H , Raman R , Coombes N , et al. (2016) Genome‐wide association study identifies new loci for resistance to Leptosphaeria maculans in Canola. Frontiers in Plant Science 7: 1513.

Scheben A , Batley J and Edwards D (2017) Genotyping‐by‐sequencing approaches to characterize crop genomes: choosing the right tool for the right application. Plant Biotechnology Journal 15: 149–161.

Shields R (2011) Common disease: are causative alleles common or rare? PLoS Biology 9: e1001009.

Tang Y , Liu X , Wang J , et al. (2016) GAPIT version 2: an enhanced integrated tool for genomic association and prediction. Plant Genome 9 (2). DOI: 10.3835/plantgenome2015.11.0120.

Voorman A , Lumley T , McKnight B and Rice K (2011) Behavior of QQ‐plots and genomic control in studies of gene‐environment interaction. PLoS One 6: e19416.

Yu J , Pressoir G , Briggs WH , et al. (2006) A unified mixed‐model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38: 203–208.

Zhang J , Song Q , Cregan PB , et al. (2015) Genome‐wide association study for flowering time, maturity dates and plant height in early maturing soybean (Glycine max) germplasm. BMC Genomics 16: 217.

Further Reading

Voorman A , Lumley T , McKnight B and Rice K (2011) Behavior of QQ‐plots and genomic control in studies of gene‐environment interaction. PLoS One 6: e19416. – A very useful explanation of Q‐Q plots.

Jeff B . How to read a genome‐wide assocation study. http://genomesunzipped.org/2010/07/how‐to‐read‐a‐genome‐wide‐association‐study.php – A tutorial on reading GWAS results for the uninitiated.

Lawson DJ , van Dorp L and Falush D (2018) A tutorial on how not to over‐interpret STRUCTURE and ADMIXTURE bar plots. Nature Communications 9: 3258. – It is easy to misinterpret STRUCTURE plots, this paper hopefully helps to avoid that.

Duke P . Use of GAPIT for Genome Wide Association Studies. http://pbgworks.org/sites/pbgworks.org/files/GAPIT_with_SYslides.pdf – A tutorial on how to run GAPIT and interpret its output.

The TASSEL 5 User Manual. https://bitbucket.org/tasseladmin/tassel‐5‐source/wiki/UserManual.

Bush WS and Moore JH (2012) Chapter 11: Genome‐wide association studies. PLoS Computational Biology 8 (12): e1002822. – A much more in‐depth overview of concepts behind GWAS with a focus on humans.

Scheben A , Batley J and Edwards D (2017) Genotyping‐by‐sequencing approaches to characterize crop genomes: choosing the right tool for the right application. Plant Biotechnology Journal 15: 149–161. – Very useful discussion on when to use which genotyping technology in plants.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Anderson, Robyn, Edwards, David, Batley, Jacqueline, and Bayer, Philipp Emanuel(Feb 2019) Genome‐Wide Association Studies in Plants. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0027950]