Evolutionary Origin of Orphan Genes


Orphan genes are genes that occur in specific evolutionary lineages without similarity to genes outside of these lineages and have, therefore, alternatively been named taxonomically restricted genes. They were so far considered to emerge through duplication–divergence processes, but it is now becoming clear that they can also arise de novo out of noncoding deoxyribonucleic acid (DNA). This latter process may even occur much more frequently than previously assumed. It appears that genomes harbour many transcripts in a transition stage from nonfunctional to functional genes, also known as protogenes, which are exposed to evolutionary testing and can become fixed when they turn out to be useful. Orphan genes may have played key roles in generating lineage‐specific adaptations and could be a continuous source of evolutionary novelties. Their existence suggests that functional ribonucleic acids (RNAs) and proteins can relatively easily arise out of random nucleotide sequences, although these processes still need to be experimentally explored.

Key Concepts:

  • Orphan genes, or taxonomically restricted genes, have arisen at all levels of the phylogenetic hierarchy.

  • All genes that cannot be traced to the first cellular ancestor are orphan genes in some lineages.

  • New genes may not only arise through gene duplication, but also through de novo evolution.

  • Spurious transcripts can give rise to protogenes, from which new functional genes evolve.

  • Emergence of new genes from protogenes is an active process in all extant genomes.

  • New genes may first act as noncoding RNAs before obtaining a functional reading frame.

  • Overprinting of existing reading frames with new reading frames is another possibility of de novo evolution of gene functions.

  • Orphan genes may contribute to lineage‐specific adaptations.

  • Orphan genes may carry information on the evolutionary past that can be harnessed by the phylostratigraphic approach.

  • There is a continuous birth–death dynamics of gene evolution.

Keywords: gene emergence; phylostratigraphy; noncoding RNA; lineage‐specific adaptations; overprinting

Figure 1.

Examples of phylostratigraphic analyses of the mouse genome. (a) Depiction of 20 phylostrata (PS) ranging from the cellular origin to the extant house mouse (Mus musculus domesticus) across the whole phylogeny. Each node is represented by several fully sequenced genomes (or at least extensive EST data), representing the respective phylogenetic split. All annotated protein‐coding genes of the mouse were subjected to BLAST analysis to find the oldest homologue within this phylogeny. The bar graphs to the right depict the numbers of genes found at the respective levels. The procedure of finding the oldest homologue is necessarily somewhat dependent on the BLAST cutoff chosen (see discussion on this topic in Tautz and Domazet‐Lošo, ), but the general pattern would not change much at different cutoffs or with different search algorithms. (b) Same analysis as above, but including the time frame for the separation of the nodes and gene numbers scaled to the respective time intervals (note the nonlinear time scale to allow an optimal resolution of the nodes). This depiction allows to infer rates of emergence of genes and shows that the rate is highest in the youngest lineage leading to the extant species.

Figure 2.

A general depiction of the life cycle of genes (after Carvunis et al., ). This representation assumes that genes emerge regularly out of nongenic sequences via a protogene phase. Once established as functional genes, they can expand into gene families. Alternatively, gene copies can also be lost again and become nongenic sequences.

Figure 3.

Depiction of the inference scheme for de novo evolution out of nongenic DNA. This scheme depicts a phylogeny of six related species, of which only species 6 is the focal species, where the hypothesis of a de novo gene evolution is tested. To make a solid case, one should show that the corresponding DNA region is present in the related species and synthenic in these species (indicated here by the depiction of the flanking genes Abc and Xyz). The region should also still be alignable, that is, the species that are compared should be sufficiently close to each other to ensure that even neutrally diverging sequences have not yet acquired too many mutations. Finally, all outgroups should not have a sign of a gene in the respective position, possibly apart of the most closely related ones, which could have a protogene or an RNA gene in the position.

Figure 4.

Example for a well studied overprinted locus. Both genes are tumour suppressor genes, but p16INK4a is the older one. p19ARF (ARF, alternative reading frame) originated through a new exon that splices to the central exon of p16INK4a but is translated from a different frame. Both proteins were shown to be functional (Quelle et al., ). Boxes indicate exons, filled boxes indicate protein‐coding regions.



Alba MM and Castresana J (2007) On homology searches by protein Blast and the characterization of the age of genes. BMC Evolutionary Biology 7: 53.

Altschul SF, Madden TL, Schaffer AA et al. (1997) Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389–3402.

Begun DJ, Lindfors HA, Kern AD and Jones CD (2007) Evidence for de novo evolution of testis‐expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176: 1131–1137.

Berretta J and Morillon A (2009) Pervasive transcription constitutes a new level of eukaryotic genome regulation. EMBO Reports 10: 973–982.

Bornberg‐Bauer E, Huylmans AK and Sikosek T (2010) How do new proteins arise? Current Opinion in Structural Biology 20: 390–396.

Cai J, Zhao RP, Jiang HF and Wang W (2008) De novo origination of a new protein‐coding gene in Saccharomyces cerevisiae. Genetics 179: 487–496.

Carninci P (2010) RNA dust: where are the genes? DNA Research 17: 51–59.

Carvunis AR, Rolland T, Wapinski I et al. (2012) Proto‐genes and de novo gene birth. Nature 487: 370–374.

Chen SD, Zhang YE and Long MY (2010) New genes in Drosophila quickly become essential. Science 330: 1682–1685.

Chothia C (1992) Proteins – 1000 families for the molecular biologist. Nature 357: 543–544.

Chung WY, Wadhawan S, Szklarczyk R, Pond SK and Nekrutenko A (2007) A first look at ARFome: dual‐coding genes in mammalian genomes. PLoS Computational Biology 3: 855–861.

Clark MB, Amaral PP, Schlesinger FJ et al. (2011) The reality of pervasive transcription. PloS Biology 9.

Demuth JP and Hahn MW (2009) The life and death of gene families. Bioessays 31: 29–39.

Domazet‐Lošo T, Brajkovic J and Tautz D (2007) A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends in Genetics 23: 533–539.

Domazet‐Lošo T and Tautz D (2003) An evolutionary analysis of orphan genes in Drosophila. Genome Research 13: 2213–2219.

Domazet‐Lošo T and Tautz D (2010a) A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468: 815–818.

Domazet‐Lošo T and Tautz D (2010b) Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa. BMC Biology 8: 66.

Drummond DA and Wilke CO (2008) Mistranslation‐induced protein misfolding as a dominant constraint on coding‐sequence evolution. Cell 134: 341–352.

Dujon B (1996) The yeast genome project: what did we learn? Trends in Genetics 12: 263–270.

Dyson HJ and Wright PE (2005) Intrinsically unstructured proteins and their functions. Nature Reviews Molecular Cell Biology 6: 197–208.

Fischer D and Eisenberg D (1999) Finding families for genomic ORFans. Bioinformatics 15: 759–762.

Heinen T, Staubach F, Haming D and Tautz D (2009) Emergence of a new gene from an intergenic region. Current Biology 19: 1527–1531.

Jacob F (1977) Evolution and tinkering. Science 196: 1161–1166.

Kaessmann H (2010) Origins, evolution, and phenotypic impact of new genes. Genome Research 20: 1313–1326.

Kaessmann H, Vinckenbosch N and Long MY (2009) RNA‐based gene duplication: mechanistic and evolutionary insights. Nature Reviews Genetics 10: 19–31.

Keeling PJ and Palmer JD (2008) Horizontal gene transfer in eukaryotic evolution. Nature Reviews Genetics 9: 605–618.

Keese PK and Gibbs A (1992) Origins of genes: big bang or continuous creation? Proceedings of the National Academy of Sciences of the United States of America 89: 9489–9493.

Khalturin K, Anton‐Erxleben F, Sassmann S et al. (2008) A novel gene family controls species‐specific morphological traits in hydra. PloS Biology 6: 2436–2449.

Khalturin K, Hemmrich G, Fraune S, Augustin R and Bosch TCG (2009) More than just orphans: are taxonomically restricted genes important in evolution? Trends in Genetics 25: 404–413.

Kleene KC (2005) Sexual selection, genetic conflict, selfish genes, and the atypical patterns of gene expression in spermatogenic cells. Developmental Biology 277: 16–26.

Klemke M, Kehlenbach RH and Huttner WB (2001) Two overlapping reading frames in a single exon encode interacting proteins – a novel way of gene usage. EMBO Journal 20: 3849–3860.

Knowles DG and McLysaght A (2009) Recent de novo origin of human protein‐coding genes. Genome Research 19: 1752–1759.

Krylov DM, Wolf YI, Rogozin IB and Koonin EV (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Research 13: 2229–2235.

Kutter C, Watt S, Stefflova K et al. (2012) Rapid turnover of long noncoding rnas and the evolution of gene expression. PloS Genetics 8.

Levine MT, Jones CD, Kern AD, Lindfors HA and Begun DJ (2006) Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X‐linked and exhibit testis‐biased expression. Proceedings of the National Academy of Sciences of the USA 103: 9935–9939.

Li CY, Zhang Y, Wang ZB et al. (2010) A human‐specific De novo protein‐coding gene associated with human brain functions. PLoS Computational Biology 6: e1000734.

Long M, Betran E, Thornton K and Wang W (2003) The origin of new genes: glimpses from the young and old. Nature Reviews Genetics 4: 865–875.

Michel AM, Choudhury KR, Firth AE et al. (2012) Observation of dually decoded regions of the human genome using ribosome profiling data. Genome Research 22: 2219–2229.

Nekrutenko A, Wadhawan S, Goetting‐Minesky P and Makova KD (2005) Oscillating evolution of a mammalian locus with overlapping reading frames: An XL alpha s/ALEX relay. PloS Genetics 1: 197–204.

Ohno S (1970) Evolution by Gene Duplication. New York: Springer.

Ohno S (1984) Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence. Proceedings of the National Academy of Sciences of the USA 81: 2421–2425.

Orengo CA and Thornton JM (2005) Protein families and their evolution – a structural perspective. Annual Review of Biochemistry 74: 867–900.

Polev D (2012) Transcriptional noise as a driver of gene evolution. Journal of Theoretical Biology 293: 27–33.

Quelle DE, Zindy F, Ashmun RA and Sherr CJ (1995) Alternative reading frames of the INK4a tumor suppressor gene encode two unrelated proteins capable of inducing cell cycle arrest. Cell 83: 993–1000.

Quint M, Drost HG, Gabel A et al. (2012) A transcriptomic hourglass in plant embryogenesis. Nature 490: 98–101.

Remmert M, Biegert A, Hauser A and Soding J (2012) HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nature Methods 9: 173–175.

Sabath N, Wagner A and Karlin D (2012) Evolution of viral proteins originated de novo by overprinting. Molecular Biology and Evolution 29: 3767–3780.

Schlessinger A, Schaefer C, Vicedo E et al. (2011) Protein disorder – a breakthrough invention of evolution? Current Opinion in Structural Biology 21: 412–418.

Sherr CJ (2006) Divorcing ARF and p53: an unsettled case. Nature Reviews Cancer 6: 663–673.

Siepel A (2009) Darwinian alchemy: human genes from noncoding DNA. Genome Research 19: 1693–1695.

Soding J and Lupas AN (2003) More than the sum of their parts: on the evolution of proteins from peptides. Bioessays 25: 837–846.

Tautz D (2009) Polycistronic peptide coding genes in eukaryotes – how widespread are they? Briefings in Functional Genomics & Proteomics 8: 68–74.

Tautz D and Domazet‐Lošo T (2011) The evolutionary origin of orphan genes. Nature Reviews Genetics 12: 692–702.

Toll‐Riera M, Bosch N, Bellora N et al. (2009) Origin of primate orphan genes: a comparative genomics approach. Molecular Biology and Evolution 26: 603–612.

Volff JN (2006) Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays 28: 913–922.

Wilson BA and Masel J (2011) Putatively noncoding transcripts show extensive association with ribosomes. Genome Biology and Evolution 3: 1245–1252.

Wilson GA, Bertrand N, Patel Y et al. (2005) Orphans as taxonomically restricted and ecologically important genes. Microbiology 151: 2499–2501.

Wilson GA, Feil EJ, Lilley AK and Field D (2007) Large‐scale comparative genomic ranking of taxonomically restricted genes (TRGS) in bacterial and archaeal genomes. PloS One 2: e324.

Wu DD, Irwin DM and Zhang YP (2011) De novo origin of human protein‐coding genes. PloS Genetics 7: e1002379.

Xie C, Zhang YE, Chen JY et al. (2012) Hominoid‐specific de novo protein‐coding genes originating from long non‐coding RNAs. PloS Genetics 8: e1002942.

Yin YB and Fischer D (2006) On the origin of microbial ORFans: quantifying the strength of the evidence for viral lateral transfer. BMC Evolutionary Biology 6: 63.

Yin YB and Fischer D (2008) Identification and investigation of ORFans in the viral world. BMC Genomics 9: 24.

Zhang JZ (2003) Evolution by gene duplication: an update. Trends in Ecology and Evolution 18: 292–298.

Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E and Skolnick J (2006) On the origin and highly likely completeness of single‐domain protein structures. Proceedings of the National Academy of Sciences of the USA 103: 2605–2610.

Zhou Q, Zhang GJ, Zhang Y et al. (2008) On the origin of new genes in Drosophila. Genome Research 18: 1446–1455.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Tautz, Diethard, Neme, Rafik, and Domazet‐Lošo, Tomislav(May 2013) Evolutionary Origin of Orphan Genes. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0024601]