Protein‐Coding Segments: Evolution of Exon–Intron Gene Structure

Abstract

Many eukaryotic genes are disrupted by noncoding regions of deoxyribonucleic acid (DNA) of variable sizes called introns, giving the genes an exon–intron structure. Origin and evolution of introns is an important, long‐standing problem. The availability of multiple, complete genome sequences allows one to address many fundamental evolutionary questions. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes between animals, fungi, plants and protists. The data on intron positions were used as the starting point for evolutionary reconstruction with various phylogenetic methods. These methods reconstructed intron‐rich ancestors but in many cases inferred lineage‐specific high levels of intron loss and gain. These results indicate that numerous introns were present already at the earliest stages of evolution of eukaryotes and are compatible with the hypothesis that the original, catastrophic intron invasion accompanied the emergence of the eukaryotic cells.

Key concepts:

  • The exon–intron gene structure is highly dynamic.

  • Orthologous genes from distant eukaryotic species share up to 25–30% intron positions.

  • Intron gains and losses might occur during limited time spans.

  • The Last Eukaryotic Common Ancestor was intron‐rich.

  • In the course of evolution, the splice signal shifts from exons to introns.

  • The scenario of the origin and evolution of introns that is best compatible with the results of comparative genomics goes as follows: self‐splicing introns since the earliest stages of life's evolution followed by numerous spliceosomal introns invading genes of the emerging eukaryotes and subsequent lineage‐specific loss and gain of spliceosomal introns.

Keywords: gene structure; molecular evolution; introns; exons

Figure 1.

The mitochondrial‐targeting presequence of the cytochrome c1 (Cyc1) precursor in potato originated from the plant GapC gene for cytosolic glyceraldehyde‐3‐phosphate dehydrogenase. The boxes represent exons. The three duplicate exons from GapC genes recombined with the mitochondrial‐orginated nuclear gene encoding cytochrome c1 to form a complete cytochrome c1 precursor gene (Cyc1). The presequence encoded by the three exons in Cyc1 has a mitochondrial targeting function. The similarity between the donor peptide sequence (GapC) and the peptide sequences encoded by the shuffled exons in the acceptor gene (Cyc1) are significantly high (64% similarity), as shown in the alignment of the two amino acid sequences. ‘|’ indicates identical amino acids and ‘+’ indicates amino acids which have similar physico‐chemical properties according to the BLASTP program (www.ncbi.nlm.nih.gov/blast).

Figure 2.

Conservation and variability of intron positions in orthologous eukaryotic genes. The data are for ribosomal protein L37. The intron positions are shown directly on the alignment and the conversion of the intron‐alignment mapping into a presence–absence matrix is illustrated. ‘0’ indicates the absence of an intron and ‘1’ indicates the presence of an intron in the given alignment position (shown on top of the table). The highly conserved intron positions are highlighted (green). Species abbreviations: Homsa, human Homo sapiens; Caeel, worm Caenorhabditis elegans; Drome, fly Drosophila melanogaster; Anoga, mosquito Anopheles gambiae; Schpo, fungus Schizosaccharomyces pombe; Sacce, fungus Saccharomyces cerevisiae; Arath, plant Arabidopsis thaliana; Dicdi, slime mould Dictyostelium discoideum and Plafa, protist Plasmodium falciparum.

Figure 3.

Conservation of intron positions in eukaryotic orthologous gene sets: the matrix of pairwise interspecies comparisons. The diagonal (bold) row shows the total number of introns in the 684 analysed genes from the given species. The upper triangle of the matrix shows the raw number of introns shared by each pair of genomes and the lower triangle shows the value of 100*Oij; Oij is the Jaquard coefficient calculated as Qij=Cij/(Ni+NjCij) where Cij is the number of introns shared by genomes i and j, and Ni and Nj the total numbers of introns in orthologous genes in genomes i and j, respectively. Species abbreviations: Homsa, human Homo sapiens; Caeel, worm Caenorhabditis elegans; Drome, fly Drosophila melanogaster; Anoga, mosquito Anopheles gambiae; Schpo, fungus Schizosaccharomyces pombe; Sacce, fungus Saccharomyces cerevisiae; Arath, plant Arabidopsis thaliana and Plafa, protist Plasmodium falciparum.

Figure 4.

Distribution of intron gain and loss rates across the phylogenetic tree of eukaryotes. The tree topology assumes the unikont–bikont division and the Coelomata clade. Node sizes are proportional to their (known or inferred) intron density, and the branches are colour‐coded: green, predominant intron gain; red, predominant intron loss and blue, balanced gain and loss. The sole brown branch (Ascomycota) designates extensive (significantly greater than the mean over the tree) gains and losses. Species and lineage abbreviations: Homsa, human Homo sapiens; Roden, mouse Mus musculus and rat Rattus norvegicus combined; Strpu, sea urchin Strongylocentrotus purpuratus; Cioin, Ciona intestinalis; Danre, fish Danio rerio; Galga, chicken Gallus gallus; Caeel, worm Caenorhabditis elegans; Drome, fly Drosophila melanogaster; Anoga, mosquito Anopheles gambiae; Apime, bee Apis mellifera; Nemve, sea anemone Nemtostella vectensis; Cryne,fungus Cryptococcus neoformans; Schpo, fungus Schizosaccharomyces pombe; Sacce, fungus Saccharomyces cerevisiae; Aspfu, fungus Aspergillus fumigatus; Neucr, fungus Neurospora crassa; Arath, plant Arabidopsis thaliana; Orysa, plant Oryza sativa (rice); Thepa, protist Theileria parva; Plafa, protist Plasmodium falciparum and Dicdi, slime mold Dictyostelium discoideum.

close

Further Reading

Carmel L, Wolf YI, Rogozin IB and Koonin EV (2007) Three distinct modes of intron dynamics in the evolution of eukaryotes. Genome Research 17: 1034–1044.

Csurös M, Rogozin IB and Koonin EV (2008) Extremely intron‐rich genes in the alveolate ancestors inferred with a flexible maximum‐likelihood approach. Molecular Biology and Evolution 25: 903–911.

Fedorova L and Fedorov A (2003) Introns in gene evolution. Genetica 118: 123–131.

Koonin EV (2006) The origin of introns and their role in eukaryogenesis: a compromise solution to the introns‐early versus introns‐late debate? Biology Direct 1: 22.

Logsdon JM Jr (1998) The recent origins of spliceosomal introns revisited. Current Opinion in Genetics & Development 8: 501–508.

Long M and DeSouza SJ (1998) Intron–exon structures: from molecular to population biology. Advances in Genome Biology 5A: 143–178.

Rogozin IB, Sverdlov AV, Babenko VN and Koonin EV (2005) Analysis of evolution of exon–intron structure of eukaryotic genes. Briefings in Bioinformatics 6: 118–134.

Roy SW and Gilbert W (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nature Reviews in Genetics 7: 211–221.

Sharp PA (1994) Split genes and RNA splicing. Cell 77: 805–815.

Yoshihama M, Nguyen HD and Kenmochi N (2007) Intron dynamics in ribosomal protein genes. PLoS ONE 2: e141.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Rogozin, Igor B(Sep 2009) Protein‐Coding Segments: Evolution of Exon–Intron Gene Structure. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0000887.pub2]