Gene Families


The multitude of genes fall into a much smaller number of gene families. An understanding of gene families allows their evolutionary history to be traced and allows experimental results from one gene to be transferred to other related genes.

Keywords: sequence alignment; domain; gene duplication; sequence similarity; alignment; phylogenetic tree

Figure 1.

Structure of the protein phospholipase C, from the bacterium that causes gas gangrene. The dashed line shows the delineation of the protein into two structural domains.

Figure 2.

BLAST alignment of two protein sequences.

Figure 3.

BLAST result where only partial similarity is found.

Figure 4.

Schematic showing the domain organization of two proteins that share similarity over part of their length: they contain a pair of CBS domains in common.

Figure 5.

Multiple‐sequence alignment of CBS domains. The leftmost column gives the protein identifier followed by two numbers that give the start and end positions of the domain in the whole sequence. The sequence of each domain is represented as a single row. Conserved positions are shaded to highlight potentially interesting regions of the proteins. Different shades represent the different physicochemical properties of groups of amino acids.

Figure 6.

Phylogenetic tree of the PALP enzyme family proteins. The topmost sequence in the tree, labeled ‘QUERY SEQUENCE’, is the protein for which we wish to know the function. The tree has been divided into four major functional groups: group I are threonine synthases, group II are cysteine synthases, group III are tryptophan synthases and group IV are serine and threonine deaminases.


Further Reading

Adams MD, Celniker SE, Holt RA, et al. (2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195.

Altschul SF, Madden TL, Schaffer AA, et al. (1997) Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389–3402.

Bateman A (1997) The structure of a domain common to Archaebacteria and the homocystinuria disease protein. Trends in Biochemical Sciences 22: 12–13.

Bateman A, Birney E, Cerruti L, et al. (2002) The Pfam protein families database. Nucleic Acids Research 30(1): 276–280.

Bork P (1992) Mobile modules and motifs. Current Opinion in Structural Biology 2: 413–421.

Brenner SE, Hubbard T, Murzin A and Chothia C (1995) Gene duplications in H. influenzae. Nature 378: 140.

Chothia C (1992) One thousand families for the molecular biologist. Nature 357: 543–544.

Cole ST, Brosch R, Parkhill J, et al. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393(6685): 537–544.

Schultz J, Milpetz F, Bork P and Ponting CP (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proceedings of the National Academy of Sciences of the United States of America 95(11): 5857–5864.

Web Links

DNA sequences. The NCBI BLAST. This website provides tools to search DNA and protein sequences against all known protein and

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Bateman, Alex, and Mifsud, William(Sep 2005) Gene Families. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1038/npg.els.0005045]