Gene Copy‐Number Changes in Evolution


High‐throughput genomics have revealed widespread copy‐number differences within and among populations of species belonging to diverse taxonomic groups. Experimental evolution in model organisms with minimal natural selection demonstrates that genome‐wide empirical estimates of the spontaneous rates of gene duplication and deletion are extremely high and contribute to the abundance of copy‐number variation (CNV). CNVs are, on average, deleterious with respect to fitness, but their high spontaneous rates of origin can also facilitate rapid adaptation to novel environmental challenges. Duplications and deletions constitute opposing forces that shape genome complexity and size. Gene duplications are the ultimate source of new genes that confer novel phenotypes, whereas deletions remove superfluous genetic material. Furthermore, CNVs can contribute to the evolution of genetic incompatibility and speciation. Future challenges for understanding the evolutionary potential of CNVs include elucidating the relative roles of genetic drift and natural selection for the maintenance of CNVs in populations.

Key Concepts

  • Gene duplications are naturally occurring mutations within genomes wherein genic material is duplicated, resulting in additional copies of the duplicated gene.
  • Gene deletions are mutations where genic material has been deleted, thereby reducing the number of copies of the deleted gene.
  • Gene duplications and deletions have resulted in extensive gene copy‐number variation (CNV) in populations across all domains of life.
  • The evolution of gene content and genome size is the net result of duplications adding new genetic material and contributing to the evolution of new genes, and deletions, which are continuously removing genetic information.
  • The study of copy‐number variants at single loci has a long history in population genetics, but a systematic analysis of CNVs across whole genomes became possible only after technical breakthroughs in DNA microarray technology and whole‐genome sequencing.
  • The rates of spontaneous gene duplication and deletion per gene are extraordinarily high and usually much higher than the nucleotide substitution rates.
  • Most CNVs in natural populations are deleterious.
  • Both duplications and deletions can contribute to adaptive genetic variation in natural and experimental populations.
  • The long‐term maintenance of duplicated genes, which contribute to the evolution of novel genes, can be achieved by both positive selection on gene copy‐number and subfunctionalisation resulting from the loss of partial functions of complementary gene copies.
  • Gene duplication and subfunctionalisation can contribute to reproductive isolation and speciation.
  • A comprehensive understanding of how CNVs contribute to the evolution of genomes will require a combination of high‐throughput genomics with analysis of the functional and fitness consequences of CNVs in experimental and natural populations.

Keywords: gene deletion; gene duplication; copy‐number variation; fitness; speciation; subfunctionalisation; neofunctionalisation; genome; evolution; mutation

Figure 1. Schematic displaying the contributions of copy‐number changes to genome size. Genome expansion occurs via the gain of new sequences by gene and genome (not shown) duplication. Genome contraction proceeds by loss of existing duplicate segments and deletions of unique DNA segments.
Figure 2. Rates of gene duplication estimated from (1) locus‐specific assays, (2) population frequencies of copy‐number variants, (3) bioinformatic analyses of initially sequenced genomes of model organisms and (4) empirical genome‐wide analyses of mutation accumulation experiments. Estimates of the duplication rate (duplication/gene/generation) vary several orders of magnitude across these studies employing different approaches and are represented on a logarithmic scale to facilitate comparison.
Figure 3. Rates of gene deletion estimated from (1) locus‐specific assays, (2) population frequencies of copy‐number variants and (3) empirical genome‐wide analyses of mutation accumulation experiments. Estimates of the deletion rate (deletion/gene/generation) vary several orders of magnitude across these studies employing different approaches and are represented on a logarithmic scale to facilitate comparison.
Figure 4. The origin of genetic incompatibility via reciprocal silencing of duplicate genes (adapted from Lynch and Force, ). Let us take an example of an ancestral gene comprising three functional regulatory regions (small, yellow squares) which control gene expression and the coding region (larger green rectangle). Duplication of the ancestral locus and associated regulatory regions yields structurally and functionally redundant duplicate loci A and B, each possessing the full repertoire of ancestral function. The ancestral population subsequently splits into two geographically isolated subpopulations: subpopulation 1 with duplicate copies A1 and B1, and subpopulation 2 with duplicate copies A2 and B2. Given the deleterious nature of most newly occurring mutations, each descendant subpopulation is expected to accumulate degenerative silencing (nonfunctionalizing) mutations at one of two duplicate loci. Although the mutations in themselves are deleterious (loss‐of‐function), they accumulate in a neutral fashion within each subpopulation due to the presence of genetic redundancy afforded by an extra gene copy that enables the maintenance of the full ancestral function irrespective of the accumulating mutations. This divergent resolution of the duplicate copies in the two subpopulations is represented in this schematic. Subpopulation 1 gains nonfunctionalizing mutations in the first and third regulatory regions of copy A1 and the second regulatory region of copy B1 (represented by white shaded boxes). However, within subpopulation 1, the presence of at least one functional copy of all three regulatory regions across the two duplicate copies maintains the ancestral expression. Likewise, although subpopulation 2 loses function of its first and second regulatory region in copy A2 and the third regulatory region in copy B2, the ancestral expression profile is preserved because of the presence of at least one functional version of all three regulatory regions between gene duplicates A2 and B2. Therefore, the nonfunctionalizing mutations are neutral with respect to fitness within each subpopulation. However, in a hybrid background, independent assortment of these alleles can result in the inheritance of nonfunctional modules across both duplicate copies, thereby leading to sterility or inviability. In this particular scenario, half of the F1 hybrid gametes are inviable and one‐eighth of the F2 hybrids will be double‐homozygotes for nonfunctional paralogs, leading to inviability or sterility if the ancestral gene function is necessary for fitness‐related traits. For example, F2 zygotes with genotype A1A1B2B2 lack a functional third regulatory region leading to loss of one ancestral subfunction. F2 zygotes with genotype A2A2B1B1 possess a silenced allele for the second regulatory region leading to loss of another ancestral subfunction.


Anderson RP and Roth JR (1977) Tandem genetic duplications in phage and bacteria. Annual Review of Microbiology 31: 473–505.

Anderson RP and Roth J (1981) Spontaneous tandem genetic duplications Salmonella typhimurium arise by unequal recombination between rRNA (rrn) cistrons. Proceedings of the National Academy of Sciences of the United States of America 78: 3113–3117.

Andersson DI and Hughes D (2009) Gene amplification and adaptive evolution in bacteria. Annual Review of Genetics 43: 167–195.

Bensasson D, Feldman MW and Petrov DA (2003) Rates of DNA duplication and mitochondrial DNA insertion in the human genome. Journal of Molecular Evolution 57: 343–354.

Bergthorsson U, Andersson DE and Roth JR (2007) Ohno's dilemma: evolution of new genes under continuous selection. Proceedings of the National Academy of Sciences of the United States of America 104: 17004–17009.

Bikard D, Patel D, Le Metté C, et al. (2009) Divergent evolution of duplicate genes leads to genetic incompatibilities within A. thaliana. Science 323: 623–626.

Blount ZD, Barrick JE, Davidson CJ and Lenski RE (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489: 513–518.

Brawand D, Wagner CE and Li YI (2014) The genomic substrate for adaptive radiation in African cichlid fish. Nature 513: 375–381.

Buggs RJA, Chamala S, Wu W, et al. (2012) Rapid, repeated, and clustered loss of duplicate genes in allopolyploid plant populations of independent origin. Current Biology 22: 248–252.

Carreto L, Eiriz MF, Gomes AC, et al. (2008) Comparative genomics of wild type yeast strains unveils important genome diversity. BMC Genomics 9: 17.

Chan YF, Marks ME, Jones FC, et al. (2010) Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327: 302–305.

Cheeseman IH, Miller B, Tan JC, et al. (2015) Population structure shapes copy number variation in malaria parasites. Molecular Biology and Evolution. DOI: 10.1093/molbev/msv282.

Chen W‐K, Swartz JD, Rush LJ and Alvarez CE (2009) Mapping DNA structural variation in dogs. Genome Research 19: 500–509.

Conrad DF, Andrews TD, Carter NP and Hurles MEand Pritchard JK (2006) A high‐resolution survey of deletion polymorphism in the human genome. Nature Genetics 38: 75–81.

Conrad DF, Pinto D, Redon R, et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464: 704–712.

Cooper VS, Schneider D, Blot M and Lenski RE (2001) Mechanisms causing rapid and parallel losses of ribose catabolism in evolving population of Escherichia coli B. Journal of Bacteriology 183: 2834–2841.

Cornetti L, Valente L, Dunning LT, et al. (2015) The genome of the ‘great speciator’ provides insights into bird diversification. Genome Biology and Evolution 7: 2680–2691.

Denver DR, Dolan PC, Wilhelm LJ, et al. (2009) A genome‐wide view of Caenorhabditis elegans base‐substitution mutation processes. Proceedings of the National Academy of Sciences of the United States of America 106: 16310–16314.

Diamond JM, Gilpin ME and Mayr E (1976) Species‐distance relation for birds of the Solomon Archipelago, and the paradox of the great speciators. Proceedings of the National Academy of Sciences of the United States of America 73: 2160–2164.

Elliott KT, Cuff LE and Neidle EL (2013) Copy number change: evolving views on gene amplification. Future Microbiology 8: 887–899.

Emerson JJ, Cardoso‐Moreira M, Borevitz JO and Long M (2008) Natural selection shapes genome‐wide patterns of copy‐number polymorphism in Drosophila melanogaster. Science 320: 1629–1631.

Farslow JC, Lipinski KJ, Packard LB, et al. (2015) Rapid increase in frequency of gene copy‐number variants during experimental evolution in Caenorhabditis elegans. BMC Genomics 16: 1044.

Force A, Lynch M, Pickett FB, et al. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 1531–1545.

Gao LZ and Innan H (2004) Very low gene duplication rate in the yeast genome. Science 306: 1367–13701.

Gelbart WM and Chovnick A (1979) Spontaneous unequal exchange in the rosy region of Drosophila melanogaster. Genetics 92: 849–859.

Gonzalez E, Kulkarni H, Bolivar H, et al. (2005) The influence of CCL3L1 gene‐containing segmental duplications on HIV‐1/AIDS susceptibility. Science 307: 1434–1440.

Graubert TA, Cahan P, Edwin D, et al. (2007) A high‐resolution map of segmental DNA copy number variation in the mouse genome. PLoS Genetics 3: e1.

Greenblum S, Carr R and Borenstein E (2015) Extensive strain‐level copy‐number variation across human gut microbiome species. Cell 160: 583–594.

Guryev V, Saar K, Adamovic T, et al. (2008) Distribution and functional impact of DNA copy number variation in the rat. Nature Genetics 40: 538–545.

Haag‐Liautard C, Dorris M, Maside X, et al. (2007) Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature 445: 82–85.

Haldane JBS (1935) The rate of spontaneous mutation of a human gene. Journal of Genetics 31: 317–326.

Halligan DJ and Keightley PD (2009) Spontaneous mutation accumulation studies in evolutionary genetics. Annual Review of Ecology, Evolution, and Systematics 40: 151–172.

Horiuchi T, Horiuchi S and Nowick A (1963) The genetic basis of hypersynthesis of betagalactosidase. Genetics 48: 157–169.

Hughes AL (1994) The evolution of functionally novel proteins after gene duplication. Proceedings of the Royal Society B: Biological Sciences 256: 119–5124.

Hughes TR, Roberts CJ, Dai HY, et al. (2000) Widespread aneuploidy revealed by DNA microarray expression profiling. Nature Genetics 25: 333–337.

Iafrate AJ, Feuk L, Rivera MN, et al. (2004) Detection of large‐scale variation in the human genome. Nature Genetics 36: 949–951.

Itsara A, Wu H, Smith JD, et al. (2010) De novo rates and selection of large copy number variation. Genome Research 20: 1469–1481.

Kan YW, Dozy AM, Varmus DE, et al. (1975) Deletion of α‐globin genes in haemoglobin‐H disease demonstrates multiple α‐globin structural loci. Nature 255: 255–256.

Katju V (2012) In with the old, in with the new: the promiscuity of the duplication process engenders diverse pathways for novel gene creation. International Journal of Evolutionary Biology 2012: 341932.

Katju V, Farslow JC and Bergthorsson U (2009) Variation in gene duplicates with low synonymous divergence in Saccharomyces cerevisiae relative to Caenorhabditis elegans. Genome Biology 10: R75.

Katju V and Bergthorsson U (2013) Copy‐number changes in evolution: rate, fitness effects and adaptive significance. Frontiers in Genetics 4: Article 273.

Keightley PD, Trivedi U, Thomson M, et al. (2009) Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Research 19: 1195–1201.

Keith N, Tucker AE, Jackson CE, et al. (2016) High mutational rates of large‐scale duplication and deletion in Daphnia pulex. Genome Research 26: 60–69.

King M‐C and Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188: 107–116.

Kondrashov FA (2012) Gene duplication as a mechanism of genomic adaptation to a changing environment. Proceedings of the Royal Society B: Biological Sciences 279: 5048–5057.

Koskiniemi S, Sun S, Berg OG and Andersson DI (2012) Selection‐driven gene loss in bacteria. PLoS Genetics 8: e1002787.

Kugelberg E, Kofoid E, Andersson DI, et al. (2006) Multiple pathways of selected gene amplification during adaptive mutation. Proceedings of the National Academy of Sciences of the United States of America 103: 17319–17324.

Kuo CH and Ochman H (2010) The extinction dynamics of bacterial pseudogenes. PLoS Genetics 6: e1001050.

Lam K‐WG and Jeffreys AJ (2006) Processes of copy‐number change in human DNA: the dynamics of alpha‐globin gene deletion. Proceedings of the National Academy of Sciences of the United States of America 103: 8921–8927.

Lam K‐WG and Jeffreys AJ (2007) Processes of de novo duplication of human α‐globin genes. Proceedings of the National Academy of Sciences of the United States of America 104: 10950–10955.

Lang GI, Murray AW and Botstein D (2009) The cost of gene expression underlies a fitness trade‐off in yeast. Proceedings of the National Academy of Sciences of the United States of America 106: 5755–5760.

Langley CH, Stevens K, Cardeno C, et al. (2012) Genomic variation in natural populations of Drosophila melanogaster. Genetics 192: 533–598.

Langridge J (1969) Mutations conferring quantitative and qualitative increases in beta‐galactosidase activity in Escherichia coli. Molecular and General Genetics 105: 74–83.

Lee M‐C and Marx CJ (2012) Repeated selection‐driven genome reduction of accessory genes in experimental populations. PLoS Genetics 8: e1002651.

Lenormand T, Guillemaud T, Bourget D and Raymond M (1998) Appearance and sweep of a gene duplication: adaptive response and potential for new function in the mosquito Culex pipiens. Evolution 52: 1705–1712.

Lipinski KJ, Farslow JC, Fitzpatrick KA, et al. (2011) High spontaneous rate of gene duplication in Caenorhabditis elegans. Current Biology 21: 306–310.

Locke DP, Sharp AJ, McCarroll SA, et al. (2006) Linkage disequilibrium and heritability of copy‐number polymorphisms within duplicated regions of the human genome. American Journal of Human Genetics 79: 275–290.

Long M, VanKuren NW, Chen S and Vibranovski MD (2013) New gene evolution: little did we know. Annual Review of Genetics 47: 307–333.

Lupski JR (2007) Genomic rearrangements and sporadic disease. Nature Genetics 39: S43–S47.

Lynch M and Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155.

Lynch M and Force AG (2000) The origin of interspecific genomic incompatibility via gene duplication. The American Naturalist 156: 590–605.

Lynch M and Conery JS (2003) The evolutionary demography of duplicate genes. Journal of Structural and Functional Genomics 3: 35–44.

Lynch M (2007) The Origins of Genome Architecture. Sunderland, MA: Sinauer.

Lynch M, Sung W, Morris K, et al. (2008) A genome‐wide view of the spectrum of spontaneous mutations in yeast. Proceedings of the National Academy of Sciences of the United States of America 105: 9272–9277.

Maroni G, Wise J, Young JE and Otto E (1987) Metallothionein gene duplications and metal tolerance in natural populations of Drosophila melanogaster. Genetics 117: 739–744.

Maydan JS, Lorch A, Edgley ML, et al. (2010) Copy number variation in the genomes of twelve natural isolates of Caenorhabditis elegans. BMC Genomics 11: 62.

McCune A (1997) How fast is speciation? Molecular, geological, and phylogenetic evidence from adaptive radiations of fishes. In: Givnish TJ and Sytsma KJ (eds) Molecular Evolution and Adaptive Radiation, pp. 585–610. Cambridge, UK: Cambridge University Press.

Menez J, Remy E and Buckingham RH (2001) Suppression of thermosensitive peptidyltRNA hydrolase mutation in Escherichia coli by gene duplication. Microbiology 147: 1581–1589.

Mira A, Ochman H and Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends in Genetics 17: 589–596.

Mizuta Y, Harushima Y and Kurata N (2010) Rice pollen hybrid incompatibility caused by reciprocal loss of duplicated genes. Proceedings of the National Academy of Sciences of the United States of America 107: 20417–20422.

Mukai T (1964) The genetic structure of natural populations of Drosophila melanogaster. I. Spontaneous mutation rate of polygenes controlling viability. Genetics 50: 1–19.

Nakabachi A, Yamashita A, Toh H, et al. (2006) The 160‐kilobase genome of the bacterial endosymbiont Carsonella. Science 314: 267.

Näsvall J, Sun L, Roth JR and Andersson DI (2012) Real‐time evolution of new genes by innovation, amplification and divergence. Science 338: 384–387.

Neale DB, Wegrzyn JL, Stevens KA, et al. (2014) Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biology 15: R59.

Nicholas TJ, Cheng Z, Ventura M, et al. (2009) The genomic architecture of segmental duplications and associated copy‐number variants in dogs. Genome Research 19: 491–499.

Nilsson AI, Koskiniemi S, Eriksson S, et al. (2005) Bacterial genome size reduction by experimental evolution. Proceedings of the National Academy of Sciences of the United States of America 102: 12112–12116.

Ohnishi O (1977) Spontaneous and ethyl methanesulfate‐induced mutations controlling viability in Drosophila melanogaster. I. Recessive lethal mutations. Genetics 87: 519–527.

Ohta T (1988) Time for acquiring a new gene by duplication. Proceedings of the National Academy of Sciences of the United States of America 85: 3509–3512.

Ohno S (1970) Evolution by Gene Duplication. New York: Springer‐Verlag.

Ossowski S, Schneeberger K, Clark RM, et al. (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Research 18: 2024–2033.

Pan D and Zhang L (2007) Quantifying the major mechanisms of recent gene duplications in the human and mouse genomes: a novel strategy to estimate gene duplication rates. Genome Biology 8: R158.

Patrick WM, Quandt EM, Swartzlander DB and Matsumara I (2007) Multicopy suppression underpins metabolic evolvability. Molecular Biology and Evolution 24: 2716–2722.

Perry GH, Tchinda J, McGrath SD, et al. (2006) Hotspots for copy number variation in chimpanzees and humans. Proceedings of the National Academy of Sciences of the United States of America 103: 8006–8011.

Perry GH, Dominy NJ, Claw KG, et al. (2007) Diet and the evolution of human amylase gene copy number variation. Nature Genetics 39: 1256–1260.

Proulx SR and Phillips PC (2006) Allelic divergence precedes and promotes gene duplication. Evolution 60: 881–892.

Rau MH, Marvig RL, Ehrlich GD, Molin S and Jelsbak L (2012) Deletion and acquisition of genomic content during early stage adaptation of Pseudomonas aeruginosa to a human host environment. Environmental Microbiology 14: 2200–2211.

Reams AB, Kofoid E, Savageau E and Roth JR (2010) Duplication frequency in a population of Salmonella enterica rapidly approaches steady state with or without recombination. Genetics 184: 1077–1094.

Redon R, Ishikawa S, Fitch KR, et al. (2006) Global variation in copy number in the human genome. Nature 444: 444–454.

Rokyta DR, Joyce P, Caudle SB and Wichman HA (2005) An empirical test of the mutational landscape model of adaptation using a single‐stranded DNA virus. Nature Genetics 37: 441–444.

Romero D and Palacios R (1997) Gene amplification and genomic plasticity in prokaryotes. Annual Review of Genetics 31: 91–111.

Scannell DR, Byrne KP, Gordon JL, Wong S and Wolfe KH (2006) Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440: 341–345.

Schrider DR and Hahn MW (2010) Gene copy‐number polymorphism in nature. Proceedings of the Royal Society B: Biological Sciences 277: 3213–3221.

Schrider DR, Houle D, Lynch M, et al. (2013) Rates and genomic consequences of spontaneous mutational events in Drosophila melanogaster. Genetics 194: 937–954.

Sebat J, Lakshmi B, Troge J, et al. (2004) Large‐scale copy number polymorphism in the human genome. Science 305: 525–528.

Shapira SK and Finnerty VG (1986) The use of genetic complementation in the study of eukaryotic macromolecular evolution: rate of spontaneous gene duplication at two loci of Drosophila melanogaster. Journal of Molecular Evolution 23: 159–167.

Spofford JB (1969) Heterosis and the evolution of duplications. The American Naturalist 103: 407–432.

Springer NM, Ying K, Fu Y, et al. (2009) Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genetics 5: e1000734.

Starlinger P (1977) DNA rearrangements in prokaryotes. Annual Review of Genetics 11: 103–126.

Sturtevant AH (1925) The effects of unequal crossing over at the bar locus in Drosophila. Genetics 10: 117–147.

Sudmant PH, Mallick S, Nelson BJ, et al. (2015) Global diversity, population stratification, and selection of human copy‐number variation. Science 349: aab3761.

Tautz D (2014) The discovery of de novo gene evolution. Perspectives in Biology and Medicine 57: 149–161.

Turner DJ, Miretti M, Rajan D, et al. (2008) Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nature Genetics 40: 90–95.

Van Ommen GJ‐B (2005) Frequency of new copy number variation in humans. Nature Genetics 37: 333–334.

Werth CR and Windham MD (1991) A model for divergent, allopatric speciation of polyploidy pteridophytes resulting from silencing of duplicate‐gene expression. The American Naturalist 137: 515–526.

Wolf YI and Koonin EV (2013) Genome reduction as the dominant mode of evolution. Bioessays 35: 827–837.

Wolfe KH and Shields DC (1997) Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387: 708–713.

Yamagata Y, Yamamoto E, Aya K, et al. (2010) Mitochondrial gene in the nuclear genome induces reproductive barrier in rice. Proceedings of the National Academy of Sciences of the United States of America 107: 1494–1499.

Yampolsky LY and Stoltzfus A (2001) Bias in the introduction of variation as an orienting factor in evolution. Evolution & Development 3: 73–83.

Further Reading

Alkan C, Coe BP and Eichler EE (2011) Genome structural variation discovery and genotyping. Nature Reviews Genetics 12: 363–376.

Conant GC and Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find new functions. Nature Reviews Genetics 9: 938–950.

Hastings PJ, Lupski JR, Rosenberg SM and Ira G (2009) Mechanisms of change in gene copy number. Nature Reviews Genetics 10: 551–564.

Nair S, Nash D, Sudimack D, et al. (2007) Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Molecular Biology and Evolution 24: 562–573.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Bergthorsson, Ulfar, and Katju, Vaishali(May 2016) Gene Copy‐Number Changes in Evolution. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0026319]