Gene Structure and Organization


The sequence of the human genome enables a delineation of genes and analysis of their structural properties and organization in the context of the chromosome.

Keywords: gene; genome; transcript; intron; exon

Figure 1.

Organization of exons and introns. After transcription, introns are removed by the cellular splicing machinery. Alternative usage of exons allows the inclusion or exclusion of amino acid domains in the polypeptide sequence. Shown is an alternatively spliced transcript comprising a truncated form of exon 6 and the deletion of exon 8.

Figure 2.

Splice‐site consensus sequences. Most of the recognition sequences for splice donors and acceptors lie in the introns. Capital letters indicate a base that is found consistently in that position, small letters indicate bases that are found typically. Pyrimidine tracts (y) lie in the intron near the acceptor site. The letter ‘n’ indicates that any one of the four nucleotides are found in this position.

Figure 3.

Noncoding portions of the RNA. RNA polymerase initiates transcription at a start site, the promoter, and ends transcription near a polyadenylation signal sequence ‘aauaaa’. Translation is initiated with the methionine‐coding codon AUG. The translation start site is often, but need not be, found in the first exon. Translation is terminated by one of the three stop codons, UAA, UAG or UGA. The 5′ and 3′ untranslated regions (UTRs) lie on either side of the translated portion of the message.

Figure 4.

Cis‐regulatory elements. The basal transcription apparatus (BTA) initiates RNA synthesis in response to specific input signals. The signals consist of transcription factors (proteins) and transcription‐factor‐binding sites called cis‐regulatory elements, which are located in the DNA sequence. Signaling the BTA to start transcription may require the coordinate presence of several transcription factors (squares) interacting with the DNA (thick line) and each other in a cis‐regulatory module. Repressors (circles) may prevent such interaction. A gene can be regulated by more than one module.

Figure 5.

Comparison of fibrillin cDNA to the human genomic sequence. In this dot matrix plot, the query sequence – a full‐length cDNA for fibrillin, which is involved in the connective tissue disorder Marfan syndrome – is compared against finished sequence from chromosome 15 (target). Each solid vertical line indicates a match between an exon of the mRNA and the corresponding genomic sequence. The spaces between the vertical lines are the introns. Note the large intron between the fifth and sixth exons. In this gene, over 60 exons span more than 200 kb of genomic sequence. Numbers on the x and y axis indicate the length of the cDNA and genomic sequences.

Figure 6.

Example of gene distribution across a portion of a chromosome. This screen shot from the Golden Path assembly of the human genome illustrates a big gene (diaphanous homolog 2; DIAPH2), followed by a gene desert, followed by a gene‐dense region. Note how the G+C percentage increases in the gene‐dense region (darker gray shading).


Further Reading

Aparicio SAJR (2000) How to count human genes. Nature Genetics 25: 129–130.

Davidson EH (2001) Genomic Regulatory Systems. San Diego, CA: Academic Press.

Dunham I, Shimizu N, Roe BA, et al. (1999) The DNA sequence of human chromosome 22. Nature 402: 489–495.

Hattori M, Fujiyama A, Taylor TD, et al. (2000) The DNA sequence of human chromosome 21. Nature 405: 311–319.

International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Rogic S, Mackworth AK and Ouellette FBF (2001) Evaluation of gene‐finding programs on mammalian sequences. Genome Research 11: 817–832.

Venter JC, Adams MD, Myers EW, et al. (2001) The sequence of the human genome. Science 291: 1304–1351.

Wong GKS, Passey DA and Yu J (2001) Most of the human genome is transcribed. Genome Research 11: 1975–1977.

Wright FA, Lemon WJ, Zhao WD, et al. (2001) A draft annotation and overview of the human genome. Genome Biology 2: 0025.1–0025.18.

Web Links

NCBI BLAST. A search engine for identifying similarities among DNA or protein sequences.

GeneCardsTM. An online database of human genes, their products and then involvement in disease

National Center for Biotechnology Information. A comprehensive resource for literature searching and databases.

Genome Browser (Golden Path). A comprehensive visualization tool for features of genomic sequence such as genes, repeats, and GC content.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Rowen, Lee(Sep 2005) Gene Structure and Organization. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1038/npg.els.0005008]