G+C Content Evolution in the Human Genome


The proportion of guanine (G) or cytosine (C) nucleotide bases in the human genome is approximately 40%. Fluctuating relatively to this averaged G+C content, there are large chromosomal regions with alternatingly high and low local G+C contents. The (G+C)‐rich genomic regions are typically gene rich, replicated earlier and higher in recombination activities. Many aspects of the evolution of base composition leading to the local G+C variations in the human genome have been debated, including the origin of the alternating G+C structure, whether (G+C)‐rich regions have been vanishing or emerging, what are the main mutational or nonmutational mechanisms driving the G+C content, whether G+C content is determined by neutral evolution or selection or both. A particularly promising scenario for G+C evolution is to consider the open and closed chromatin structures which naturally provide a differential mutational environment for different chromosome regions.

Key Concepts:

  • Human genome is characterised by alternatingly high and low G+C regions.

  • Such pattern of alternatingly high and low G+C regions is not unique for the human genome. It is also present in species which diverged from a common ancestor a few hundred millions years ago.

  • Both mutational and nonmutational mechanisms could be responsible for shaping the G+C content in human. The former creates new alleles with a changed G+C content, the latter biased transmits existing polymorphic alleles towards either higher or lower G+C.

  • The suggestion of natural selection playing a crucial role in maintaining high G+C regions is based on the observation that high G+C regions are also high in several biological activities (translation, transcription and recombination).

  • Chromatin structure may provide a differential mutational environment by being accessible (open) or inaccessible (closed) to other protein molecules.

Keywords: human genome; base composition; genome evolution; DNA; mutations; chromatin structure

Figure 1.

Each human chromosome is partitioned into four types of sequences: within/outside GENCODE gencode (http://www.gencodegenes.org/) and within/outside transposable elements (repetitive sequences). Sequences that are not transposable elements are called unique sequences. (a) Chromosome‐specific G+C calculated from the whole chromosome (circles), non‐GENCODE (or intergenic) sequences (black line), GENCODE/repeat sequences (red line), GENCODE/unique sequences (orange line), non‐GENCODE/repeat sequences (blue line) and non‐GENCODE/unique sequences (green line). (b) Scatter plot of three types of G+C (red, orange and blue, in y‐axis) versus the G+C obtained from non‐GENCODE/unique sequences (in x‐axis). Chromosomes 22 and 19 are (G+C)‐rich, whereas chromosomes 4 and 13 are (G+C)‐poor.

Figure 2.

Schematic illustration of gene conversion. Each vertical bar represents a single‐stranded DNA, and a double‐bar haplo copy of the genome near a chromosome position. The genotype at the base position can either be AG (then the person is heterozygous) or AA (homozygous). If everybody in a population is homozygous with the same genotype, the site is not polymorphic (and no longer a single‐nucleotide polymorphism). Gene conversion does not introduce a new allele, only potentially distorts the variants pool from which to randomly pick a gamete for the next generation. If the site is not polymorphic, gene conversion has no effect. If a person is homozygous (c), gene conversion also has no effect. For a polymorphic site and for a person with heterozygote, gene conversion may increase the G+C if it is (G+C)‐biased (e.g. a), or decrease G+C if it is (A+T)‐biased (b). For this single site, the increase of C or G bases is 0.75−0.5=0.25 (a) or −0.25 (b).



Alvarez‐Valin F, Clay O, Cruveiller S and Bernardi G (2004) Inaccurate reconstruction of ancestral GC levels creates a ‘vanishing isochores’ effect. Molecular Phylogenetics and Evolution 31: 788–793.

Audit B, Zaghloul L, Vaillant C et al. (2009) Open chromatin encoded in DNA sequence is the signature of ‘master’ replication origins in human cells. Nucleic Acids Research 37: 6064–6075.

Bernaola‐Galván P, Román‐Roldán R and Oliver JL (1996) Compositional segmentation and long‐range fractal correlations in DNA sequences. Physical Review E 53: 5181–5189.

Bernardi G (2001) Misunderstandings about isochores. Part 1. Gene 276: 3–13.

Bernardi G (2007) The neoselectionist theory of genome evolution. Proceedings of the National Academy of Sciences of the USA 104: 8385–8390.

Bernardi G and Bernardi G (1986) Compositional constraints and genome evolution. Journal of Molecular Evolution 23: 1–11.

Bernardi G, Hughes S and Mouchiroud D (1997) The major compositional transitions in the vertebrate genome. Journal of Molecular Evolution 44(Suppl. 1): S44–S51.

Brouha B, Meischl C, Ostertag E et al. (2002) Evidence consistent with human L1 retrotransposition in maternal meiosis. American Journal of Human Genetics 71: 327–336.

Carpena P, Oliver JL, Hackenberg M et al. (2011) High‐level organization of isochores into gigantic superstructures in the human genome. Physical Review E 83: 031908.

Clay O and Bernardi G (2001) Compositional heterogeneity within and among isochores in mammalian genomes. II. Some general comments. Gene 276: 25–31.

Costantini M, Cammarano R and Bernardi G (2009) The evolution of isochore patterns in vertebrate genomes. BMC Genomics 10: 146.

Costantini M, Clay O, Auletta F and Bernardi G (2006) An isochore map of human chromosomes. Genome Research 16: 536–541.

Crawford GE, Holt IE, Mullikin JC et al. (2004) Identifying gene regulatory elements by genome‐wide recovery of DNase hypersensitive sites. Proceedings of the National Academy of Sciences of the USA 101: 992–997.

D'Onofrio G, Ghosh TC and Saccone S (2007) Different functional classes of genes are characterized by different compositional properties. FEBS Letters 581: 5819–5824.

D'Onofrio G, Jabbari K, Musto H and Bernardi G (1999) The correlation of protein hydropathy with the base composition of coding sequences. Gene 238: 3–14.

Duret L and Galtier N (2009) Biased gene conversion and the evolution of mammalian genome landscapes. Annual Review of Genomics and Human Genetics 10: 285–311.

Duret L, Semon M, Piganeau G, Mouchiroud D and Galtier N (2002) Vanishing GC‐rich isochores in mammalian genomes. Genetics 162: 1837–1847.

Elango N, Kim SH, NISC Comparative Sequencing Program, Vigoda E and Yi SV (2008) Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Computational Biology 4: e1000015.

Freudenberg J, Wang M, Yang Y and Li W (2009) Partial correlation analysis indicates causal relationships between GC‐content, exon density and recombination rate variation in the human genome. BMC Bioinformatics 10(Suppl. 1): S66.

Fullerton SM, Carvalho AB and Clark AG (2001) Local rates of recombination are positively correlated with GC content in the human genome. Molecular Biology and Evolution 18: 1139–1142.

Gu J and Li WH (2006) Are GC‐rich isochores vanishing in mammals? Gene 385: 50–56.

Hickey DA and Singer GAC (2004) Genomic and proteomic adptations to growth at high temperature. Genome Biology 5: 117.

Holmquist GP (1989) Evolution of chromosome bands: molecular ecology of noncoding DNA. Journal of Molecular Evolution 28: 469–486.

Harrow J, Frankish A, Gonzalez JM et al. (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Research 22: 1760–1774.

International Human Genome Sequencing Consortium, Lander ES et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Kong A, Frigge ML, Masson G et al. (2012) Rate of de novo mutations and the importance of father's age to disease risk. Nature 488: 471–475.

Li W (2001) Delineating relative homogeneous G+C domains in DNA sequences. Gene 276: 57–72.

Li W (2002) Are isochore sequences homogeneous? Gene 300: 129–139.

Li W (2011) On parameters of the human genome. Journal of Theoretical Biology 288: 92–104.

Li W and Holste D (2005) Universal 1/f noise, crossovers of scaling exponents, and chromosome‐specific patterns of guanine‐cytosine content in DNA sequences of the human genome. Physical Review E 71: 041910.

Li W, Oliver JL, Bernaola‐Galván P and Carpena P (2003) Isochores merit the prefix ‘iso’. Computational Biology and Chemistry 27: 5–10.

Lieberman‐Aiden E, van Berkum NL, Williams L et al. (2009) Comprehensive mapping of long‐range interactions reveals folding principles of the human genome. Science 326: 289–293.

Lynch M (2010) Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences of the USA 107: 961–968.

MacArthur DG, Balasubramanian S, Frankish A et al. (2012) A systematic survey of loss‐of‐function variants in human protein‐coding genes. Science 335: 823–828.

Mancera E, Bourgon R, Brozzi A, Huber W and Steinmetz LM (2008) High‐resolution mapping of meiotic crossovers and non‐crossovers in yeast. Nature 454: 479–485.

Oliver JL, Carpena P, Hackenberg M and Bernaola‐Galván P (2004) IsoFinder: computational prediction of isochores in genome sequences. Nucleic Acids Research 32(Suppl. 2): W287–W292.

Pac̆es J, Zíka R, Pac̆es V et al. (2004) Representing GC variation along eukaryotic chromosomes. Gene 333: 135–141.

Press WH and Robins H (2005) Isochores exhibit evidence of genes interacting with the large‐scale genomic environment. Genetics 174: 1029–1040.

Robins H and Press WH (2005) Human microRNAs target a functionally distinct population of genes with AT‐rich 3′UTRs. Proceedings of the National Academy of Sciences of the USA 102: 15557–15562.

Romiguier J, Ranwez V, Douzery EJP and Galtier N (2012) Contrasting GC‐content dynamics across 33 mammalian genomes: relationship with life‐history traits and chromosome sizes. Genome Research 20: 1001–1009.

Su Z, Huang W and Gu X (2011) Comment on ‘positive selection of tyrosine loss in metazoan evolution’. Science 332: 917.

Sueoka N (1962) On the genetic basis of variation and heterogeneity of DNA base composition. Proceedings of the National Academy of Sciences of the USA 48: 582–592.

Vavouri T and Lehner B (2011) Chromatin organization in sperm may be the major functional consequence of base composition variation in the human genome. PLoS Genetics 7: e1002036.

Versteeg R, van Schaik BDC, van Batenberg MF et al. (2003) The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Research 13: 1998–2004.

Vinogradov AE (1998) Genome size and GC‐percent in vertebrates as determined by flow cytometry: the triangular relationship. Cytometry 31: 100–109.

Vinogradov AE (2003) DNA helix: the importance of being GC‐rich. Nucleic Acids Research 31: 1838–1844.

Wang J, Fan HC, Behr B and Quake SR (2012) Genome‐wide single‐cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 150: 402–412.

Wolfe KH, Sharp PM and Li WH (1989) Mutation rates differ among regions of the mammalian genome. Nature 337: 283–285.

Woodfine K, Fiegler H, Beare DM et al. (2004) Replication timing of the human genome. Human Molecular Genetics 13: 191–202.

Zöllner S, Wen X, Hanchard NA et al. (2004) Evidence for extensive transmission distortion in the human genome. American Journal of Human Genetics 74: 62–72.

Further Reading

Bernardi G (2006) Structural and Evolutionary Genomics: Natural Selection in Genome Evolution. Amsterdam: Elsevier.

Eyre‐Walker A and Hurst LD (2001) The evolution of isochores. Nature Reviews Genetics 2: 549–555.

Lynch M (2007) The Origins of Genome Architecture. Sunderland, MA, USA: Sinauer Associates.

Wolfe KH and Li WH (2003) Molecular evolution meets the genomics revolution. Nature Genetics 33: 255–265.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Li, Wentian(Apr 2013) G+C Content Evolution in the Human Genome. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0021751]