Population Stratification, Adjustment for


Population stratification is a major concern in genetic association studies. Failure to control it effectively can lead to excess false‐positive results and failure to detect true associations. Many methods have been designed to adjust for population stratification, which mainly belong to the following categories: (1) genomic control, (2) structured association, (3) principal component or multidimensional scaling adjustment, (4) stratification score method and (5) other approaches. No method is likely to be superior in all situations. Care needs to be taken to ensure that the assumptions of the method are met and that the method is used for its intended purpose.

Keywords: population stratification; genomic control; structured association; principal components; multidimensional scaling; stratification score

Figure 1.

Multidimensional scaling versus principal component approach. These figures show the clustering results using MDS and PCA with 5000 genome‐wide random autosomal SNPs from the HapMap project Phase I data. In the top panel, (a)–(c) are generated using the PCA approach as implemented in Eigenanalysis. In the bottom panel, (d)–(f) are generated using the MDS approach. Pairwise plots of the first three dimensions are presented. There is no apparent difference in their ability to visualise the ancestral differences in these populations. Multiple runs gave similar results. The signs of the dimension 1 and 2 from the MDS plots have been reversed (this does not change the relative location of each cluster) to match the geographical locations of PCA clusters. CEU, CEPH in Utah residents with ancestry from northern and western Europe; CHB, Han Chinese from Beijing, China; JPT, Japanese from Tokyo, Japan; YRI, Yoruba in Ibadan; MDS, multidimensional scaling and PCA, principal component analysis.



Alexander DH, Novembre J and Lange K (2009) Fast model‐based estimation of ancestry in unrelated individuals. Genome Research 19: 1655–1664.

Allen A, Epstein MP and Satten GA (2010) Score‐based adjustment for confounding by population stratification in genetic association studies. Genetic Epidemiology 34: 383–385.

Bacanu SA, Devlin B and Roeder K (2002) Association studies for quantitative traits in structured populations. Genetic Epidemiology 22: 78–93.

Bryc K, Auton A, Nelson MR et al. (2010) Genome‐wide patterns of population structure and admixture in West Africans and African Americans. Proceedings of the National Academy of Sciences of the USA 107: 786–791.

Campbell CD, Ogburn EL, Lunetta KL et al. (2005) Demonstrating stratification in a European American population. Nature Genetics 37: 868–872.

Cardon LR and Palmer LJ (2003) Population stratification and spurious allelic association. Lancet 361: 598–604.

Dadd T, Weale ME and Lewis CM (2009) A critical evaluation of genomic control methods for genetic association studies. Genetic Epidemiology 33: 290–298.

Devlin B, Bacanu SA and Roeder K (2004) Genomic control to the extreme. Nature Genetics 36: 1129–1130 author reply 1131.

Devlin B and Roeder K (1999) Genomic control for association studies. Biometrics 55: 997–1004.

Devlin B, Roeder K and Wasserman L (2001) Genomic control, a new approach to genetic‐based association studies. Theoretical Population Biology 60: 155–166.

Edwards TL, Scott WK, Almonte C et al. (2010) Genome‐wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease. Annals of Human Genetics 74: 97–109.

Epstein MP, Allen AS and Satten GA (2007) A simple and improved correction for population stratification in case‐control studies. American Journal of Human Genetics 80: 921–930.

Falush D, Stephens M and Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567–1587.

Fellay J, Shianna KV, Ge D et al. (2007) A whole‐genome association study of major determinants for host control of HIV‐1. Science 317: 944–947.

Freedman ML, Reich D, Penney KL et al. (2004) Assessing the impact of population stratification on genetic association studies. Nature Genetics 36: 388–393.

Gao X and Martin ER (2009) Using allele sharing distance for detecting human population stratification. Human Heredity 68: 182–191.

Gao X and Starmer J (2007) Human population structure detection via multilocus genotype clustering. BMC Genetics 8: 34.

Gao X and Starmer JD (2008) AWclust: point‐and‐click software for non‐parametric population structure analysis. BMC Bioinformatics 9: 77.

Hoggart CJ, Parra EJ, Shriver MD et al. (2003) Control of confounding of genetic associations in stratified populations. American Journal of Human Genetics 72: 1492–1504.

Kimmel G, Jordan MI, Halperin E et al. (2007) A randomization test for controlling population stratification in whole‐genome association studies. American Journal of Human Genetics 81: 895–905.

Kohler K and Bickeboller H (2006) Case‐control association tests correcting for population stratification. Annals of Human Genetics 70: 98–115.

Lee S, Sullivan PF, Zou F et al. (2008) Comment on a simple and improved correction for population stratification. American Journal of Human Genetics 82: 524–526 author reply 526–528.

Li M, Reilly MP, Rader DJ et al. (2010) Correcting population stratification in genetic association studies using a phylogenetic approach. Bioinformatics 26: 798–806.

Li Q, Wacholder S, Hunter DJ et al. (2009) Genetic background comparison using distance‐based regression, with applications in population stratification evaluation and adjustment. Genetic Epidemiology 33: 432–441.

Li Q and Yu K (2008) Improved correction for population stratification in genome‐wide association studies by identifying hidden population structures. Genetic Epidemiology 32: 215–226.

Marchini J, Cardon LR, Phillips MS et al. (2004) The effects of human population structure on large genetic association studies. Nature Genetics 36: 512–517.

Menozzi P, Piazza A and Cavalli‐Sforza L (1978) Synthetic maps of human gene frequencies in Europeans. Science 201: 786–792.

Miclaus K, Wolfinger R and Czika W (2009) SNP selection and multidimensional scaling to quantify population structure. Genetic Epidemiology 33: 488–496.

Montana G and Pritchard JK (2004) Statistical tests for admixture mapping with case‐control and cases‐only data. American Journal of Human Genetics 75: 771–789.

Nievergelt CM, Libiger O and Schork NJ (2007) Generalized analysis of molecular variance. PLoS Genetics 3: e51.

Novembre J, Johnson T, Bryc K et al. (2008) Genes mirror geography within Europe. Nature 456: 98–101.

Patterson N, Hattangadi N, Lane B et al. (2004) Methods for high‐density admixture mapping of disease genes. American Journal of Human Genetics 74: 979–1000.

Patterson N, Price AL and Reich D (2006) Population structure and eigenanalysis. PLoS Genetics 2: e190.

Price AL, Patterson NJ, Plenge RM et al. (2006) Principal components analysis corrects for stratification in genome‐wide association studies. Nature Genetics 38: 904–909.

Pritchard JK, Stephens M and Donnelly P (2000a) Inference of population structure using multilocus genotype data. Genetics 155: 945–959.

Pritchard JK, Stephens M, Rosenberg NA et al. (2000b) Association mapping in structured populations. American Journal of Human Genetics 67: 170–181.

Purcell S, Neale B, Todd‐Brown K et al. (2007) PLINK: a tool set for whole‐genome association and population‐based linkage analyses. American Journal of Human Genetics 81: 559–575.

Purcell S and Sham P (2004) Properties of structured association approaches to detecting population stratification. Human Heredity 58: 93–107.

Rakovski CS and Stram DO (2009) A kinship‐based modification of the Armitage trend test to address hidden population structure and small differential genotyping errors. PloS One 4: e5825.

Risch N and Teng J (1998) The relative power of family based and case‐control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Research 8: 1273–1288.

Satten GA, Flanders WD and Yang Q (2001) Accounting for unmeasured population substructure in case‐control studies of genetic association using a novel latent‐class model. American Journal of Human Genetics 68: 466–477.

Setakis E, Stirnadel H and Balding DJ (2006) Logistic regression protects against population structure in genetic association studies. Genome Research 16: 290–296.

Spielman RS, McGinnis RE and Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin‐dependent diabetes mellitus (IDDM). American Journal of Human Genetics 52: 506–516.

Thornton T and McPeek MS (2010) ROADTRIPS: case‐control association testing with partially or completely unknown population and pedigree structure. American Journal of Human Genetics 86: 172–184.

Tishkoff SA, Reed FA, Friedlaender FR et al. (2009) The genetic structure and history of Africans and African Americans. Science 324: 1035–1044.

Tishkoff SA, Reed FA, Ranciaro A et al. (2007) Convergent adaptation of human lactase persistence in Africa and Europe. Nature Genetics 39: 31–40.

Yang BZ, Zhao H, Kranzler HR et al. (2005) Practical population group assignment with selected informative markers: characteristics and properties of Bayesian clustering via STRUCTURE. Genetic Epidemiology 28: 302–312.

Zhang F and Deng HW (2010) Correcting for cryptic relatedness in population‐based association studies of continuous traits. Human Heredity 69: 28–33.

Zhang F, Wang Y and Deng HW (2008) Comparison of population‐based association study methods correcting for population stratification. PloS One 3: e3392.

Zhang S, Zhu X and Zhao H (2003) On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genetic Epidemiology 24: 44–56.

Zhu X, Li S, Cooper RS et al. (2008) A unified association analysis approach for family and unrelated samples correcting for stratification. American Journal of Human Genetics 82: 352–365.

Zhu X, Zhang S, Zhao H et al. (2002) Association mapping, using a mixture model for complex traits. Genetic Epidemiology 23: 181–196.

Further Reading

Hartl DL and Clark AG (2007) Principles of Population Genetics, 4th edn. Sunderland, MA: Sinauer Associates Inc.

Weir BS (1996) Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sunderland, MA: Sinauer Associates Inc.

Weir BS and Hill WG (2002) Estimating F‐statistics. Annual Review of Genetics 36: 721–750.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Gao, Xiaoyi, and Edwards, Todd L(Oct 2010) Population Stratification, Adjustment for. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0020384]