Protein–Protein Interactions and Genetic Disease


The advent of high‐throughput experiments to measure protein–protein interactions has created a flood of proteomic information parallel to the influx of data caused by the advent of next‐generation sequencing technologies. The creation of whole organism protein interaction maps has opened new avenues of predicting genetic disease. Advances in network science now allow the association of genes with disease directly from the characteristics of the protein map without reference to the characteristics of the gene itself. However, none of these techniques has reached the ‘black box’ level and require careful consideration of the systematic errors in both the underlying experimental data and the computational methods to give reliable results. Here, we review the main methods to characterise protein interactions in vitro and in vivo, the methods by which protein networks are constructed and the characteristics of the major protein interaction databases, and the techniques used to predict the functional impact of mutations on protein interaction networks.

Key Concepts

  • Protein–protein interactions are the driving force for cellular responses and disruption of protein–protein interfaces.
  • The sequence conservation of functionally important residues across protein interfaces allows the prediction of disease‐associated mutations.
  • Disease‐associated mutations often affect protein–protein binding affinities that can be measured with high accuracy in vitro using purified proteins by a variety of biophysical techniques.
  • High‐throughput techniques to measure protein interactions in vivo are prone to both high false‐positive and high false‐negative rates.
  • Each method of identifying protein interactions has systematic errors associated with it that can be partly corrected by cross validating the results with two techniques.
  • Functional associations between genes can be inferred by bioinformatic approaches, but the results do not necessarily imply a physical interaction between proteins.
  • Protein–protein interaction databases may include both complexes between proteins and purely functional associations. This distinction must be kept in mind when analysing results from protein–protein interaction databases.
  • Disease‐associated genes tend to be clustered together in protein–protein interaction networks. It is possible to predict new genes for a disease by a random walk across the protein interaction network from genes already known to be associated with the disease.
  • Currently, protein–protein interaction surfaces are mostly considered undruggable by small molecules. This belief is changing with new concepts in drug design.

Keywords: protein–protein interaction; morbidity; protein networks; yeast two hybrid; affinity copurification; systems biology; network medicine; biological database; druggability; binding affinity

Figure 1. The yeast two‐hybrid assay. (a) The yeast two‐hybrid assay begins with the construction of a prey plasmid library. Each prey plasmid encodes a protein fused to the transcription factor activation domain along with a selection marker to detect the successful incorporation of the plasmid into the cell. A bait plasmid is also constructed encoding the protein of interest fused to the DNA (deoxyribonucleic acid) binding domain of the transcription factor along with a second orthogonal selection marker. (b) The yeast cells are then permeabilised to allow entry of the plasmid and transformation of the yeast genome. (c) Once transformed, the yeasts are grown in media‐deficient pathway. (d) Binding of the prey protein to the bait protein brings the activation domain into proximity of the reporter gene and activates transcription. (e) Colonies showing transcription of the reporter gene are selected. (f) The plasmids from the active colonies are extracted and (g) the DNA corresponding to the prey protein sequenced.
Figure 2. Integration of different data sets into a protein–protein network. Accuracy can be increased by considering only the strict intersection of the data sets where a positive PPI (protein–protein interaction) exists in each data set. Alternatively, the coverage can be extended by considering the union of the data sets where a PPI is considered to exist if it is found in either data set. Weighted integration counts only the PPIs in each data set considered to be the most reliable through the consultation of an outside gold standard database.
Figure 3. Modified screenshot of an example query from the STRING database. Example of a protein network from the STRING database using the KRas protein, an oncogene implicated in the development of many cancers. The colour of the lines connecting the protein nodes indicates the particular lines of evidence used in establishing a functional association whereas the distance between the nodes is a measure of the confidence of the interaction as established by the Bayesian scoring system. Predicted GO pathways are also available (not shown).
Figure 4. Comparison of PPI interfaces and small molecule binding pockets. (a) A large and shallow PPI with three potential hot spots for small molecule binding. (b) Enzyme binding pocket. Note the smaller size and greater depth of the enzyme binding pocket in comparison with the PPI interface. Adapted from Zerbe (2012) © American Chemical Society.


Albert R, Jeong H and Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406: 378–382.

Amberger J, Bocchini CA, Scott AF and Hamosh A (2009) McKusick's online mendelian inheritance in man (OMIM (R)). Nucleic Acids Research 37: D793–D796.

Berliner N, Teyra J, Colak R, Lopez SG and Kim PM (2014) Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation. Plos One 9: e107353.

Brender JR and Zhang Y (2015) Predicting the effect of mutations on protein‐protein binding interactions through structure‐based interface profiles. PLoS Computational Biology 11 (10): e1004494.

Chatr‐aryamontri A, Oughtred R, Boucher L, et al. (2017) The BioGRID interaction database: 2017 update. Nucleic Acids Research 45: D369–D379.

Choi Y, Sims GE, Murphy S, Miller JR and Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. Plos One 7: e46688.

Das J and Yu H (2012) HINT: high‐quality protein interactomes and their applications in understanding human disease. BMC Systems Biology 6: 92.

Dourado DFAR and Flores SC (2014) A multiscale approach to predicting affinity changes in protein‐protein interfaces. Proteins 82: 2681–2690.

Driggers EM, Hale SP, Lee J and Terrett NK (2008) The exploration of macrocycles for drug discovery–an underexploited structural class. Nature Reviews. Drug Discovery 7: 608–624.

Fraser HB, Hirsh AE, Wall DP and Eisen MB (2004) Coevolution of gene expression among interacting proteins. Proceedings of the National Academy of Sciences of the United States of America 101: 9033–9038.

Gingras AC, Gstaiger M, Raught B and Aebersold R (2007) Analysis of protein complexes using mass spectrometry. Nauture Reviews Molecular Cell Biology 8: 645–654.

Hermjakob H, Montecchi‐Palazzi L, Lewington C, et al. (2004) IntAct: an open source molecular interaction database. Nucleic Acids Research 32: D452–D455.

Ho Y, Gruhler A, Heilbut A, et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180–183.

Ivanic J, Yu X, Wallqvist A and Reifman J (2009) Influence of protein abundance on high‐throughput protein‐protein interaction detection. Plos One 4: e5815.

Jansen R, Yu H, Greenbaum D, et al. (2003) A Bayesian networks approach for predicting protein‐protein interactions from genomic data. Science 302: 449–453.

Jansen R and Gerstein M (2004) Analyzing protein function on a genomic scale: the importance of gold‐standard positives and negatives for network prediction. Current Opinion in Microbiology 7: 535–545.

Johnson RM, Harrison SD and Maclean D (2011) Therapeutic applications of cell‐penetrating peptides. Methods in Molecular Biology 683: 535–551.

Juan D, Pazos F and Valencia A (2008) High‐confidence prediction of global interactomes based on genome‐wide coevolutionary networks. Proceedings of the National Academy of Sciences of the United States of America 105: 934–939.

Kohler S, Bauer S, Horn D and Robinson PN (2008) Walking the interactome for prioritization of candidate disease genes. American Journal of Human Genetics 82: 949–958.

Kortemme T, Kim DE and Baker D (2004) Computational alanine scanning of protein‐protein interfaces. Science's STKE 2004: pl2.

Mitsopoulos C, Schierz AC, Workman P and Al‐Lazikani B (2015) Distinctive behaviors of druggable proteins in cellular networks. PLoS Computational Biology 11: e1004597.

Ng PC and Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Research 11: 863–874.

Nikolovska‐Coleska Z (2015) Studying protein‐protein interactions using surface plasmon resonance. Methods in Molecular Biology 1278: 109–138.

Nolan GP (2007) What's wrong with drug screening today. Nature Chemical Biology 3: 187–191.

Oti M, Snel B, Huynen MA and Brunner HG (2006) Predicting disease genes using protein‐protein interactions. Journal of Medical Genetics 43: 691–698.

Pagel P, Kovac S, Oesterheld M, et al. (2005) The MIPS mammalian protein‐protein interaction database. Bioinformatics 21: 832–834.

Papanikolaou N, Pavlopoulos GA, Theodosiou T and Iliopoulos I (2015) Protein‐protein interaction predictions using text mining methods. Methods 74: 47–53.

Pazos F and Valencia A (2001) Similarity of phylogenetic trees as indicator of protein‐protein interaction. Protein Engineering 14: 609–614.

Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D and Yeates TO (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America 96: 4285–4288.

Peri S, Navarro JD, Amanchy R, et al. (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research 13: 2363–2371.

Raman K, Yeturu K and Chandra N (2008) targetTB: a target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome‐scale structural analysis. BMC Systems Biology 2: 109.

Schymkowitz J, Borg J, Stricher F, et al. (2005) The FoldX web server: an online force field. Nucleic Acids Research 33: W382–W388.

Scott DE, Bayly AR, Abell C and Skidmore J (2016) Small molecules, big targets: drug discovery faces the protein‐protein interaction challenge. Nature Reviews. Drug Discovery 15: 533–550.

Stumpf MP, Thorne T, de Silva E, et al. (2008) Estimating the size of the human interactome. Proceedings of the National Academy of Sciences of the United States of America 105: 6959–6964.

Stynen B, Tournu H, Tavernier J and Van Dijck P (2012) Diversity in genetic in vivo methods for protein‐protein interaction studies: from the yeast two‐hybrid system to the mammalian split‐luciferase system. Microbiology and Molecular Biology Reviews 76: 331–382.

Szilagyi A and Zhang Y (2014) Template‐based structure modeling of protein‐protein interactions. Current Opinion in Structural Biology 24: 10–23.

Szklarczyk D, Morris JH, Cook H, et al. (2017) The STRING database in 2017: quality‐controlled protein‐protein association networks, made broadly accessible. Nucleic Acids Research 45: D362–D368.

Talavera D, Robertson DL and Lovell SC (2013) The role of protein interactions in mediating essentiality and synthetic lethality. Plos One 8: e62866.

Tu ZD, Wang L, Xu M, et al. (2006) Further understanding human disease genes by comparing with housekeeping genes and other genes. BMC Genomics 7: 31.

Van Criekinge W and Beyaert R (1999) Yeast two‐hybrid: state of the art. Biologival Procedures Online 2: 1–38.

Velazquez‐Campoy A, Leavitt SA and Freire E (2015) Characterization of protein‐protein interactions by isothermal titration calorimetry. Methods in Molecular Biology 1278: 183–204.

Vidalain PO, Boxem M, Ge H, Li S and Vidal M (2004) Increasing specificity in high‐throughput yeast two‐hybrid experiments. Methods 32: 363–370.

Walhout AJ and Vidal M (1999) A genetic strategy to eliminate self‐activator baits prior to high‐throughput yeast two‐hybrid screens. Genome Research 9: 1128–1134.

Wass MN, Fuentes G, Pons C, Pazos F and Valencia A (2011) Towards the prediction of protein interaction partners using physical docking. Molecular Systems Biology 7: 469.

Wells JA and McClendon CL (2007) Reaching for high‐hanging fruit in drug discovery at protein‐protein interfaces. Nature 450: 1001–1009.

Xiong P, Zhang CX, Zheng W and Zhang Y (2017) BindProfX: assessing mutation‐induced binding affinity change by protein interface profiles with pseudo‐counts. Journal of Molecular Biology 429: 426–434.

Yu HY, Luscombe NM, Lu HX, et al. (2004) Annotation transfer between genomes: protein‐protein interologs and protein‐DNA regulogs. Genome Research 14: 1107–1118.

Zerbe BS, Hall DR, Vajda S, Whitty A and Kozakov D (2012) Relationship between hot spot residues and ligand binding hot spots in protein‐protein interfaces. Journal of Chemical Information and Modeling 52: 2236–2244.

Zhang QC, Petrey D, Deng L, et al. (2012) Structure‐based prediction of protein‐protein interactions on a genome‐wide scale. Nature 490: 556–560.

Zhao H and Beckett D (2008) Kinetic partitioning between alternative protein‐protein interactions controls a transcriptional switch. Journal of Molecular Biology 380: 223–236.

Further Reading

Bader GD and Hogue CWV (2002) Analyzing yeast protein‐protein interaction data obtained from different sources. Nature Biotechnology 20: 991–997.

Franzosa E, Linghu B and Xia Y (2009) Computational reconstruction of protein‐protein interaction networks: algorithms and issues. Methods in Molecular Biology 541: 89–100.

Goh KI, Cusick ME, Valle D, et al. (2007) The human disease network. Proceedings of the National Academy of Sciences of the United States of America 104: 8685–8690.

Jeong H, Mason SP, Barabasi AL and Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411: 41–42.

de Juan D, Pazos F and Valencia A (2013) Emerging methods in protein co‐evolution. Nature Reviews. Genetics 14: 249–261.

Klingstrom T and Plewczynski D (2011) Protein‐protein interaction and pathway databases, a graphical review. Briefings in Bioinformatics 12: 702–713.

Raman K (2010) Construction and analysis of protein‐protein interaction networks. Automated Experimentation 2: 2.

Sanderson CM (2009) The Cartographers toolbox: building bigger and better human protein interaction networks. Briefings in Functional Genomics & Proteomics 8: 1–11.

Snider J, Kotlyar M, Saraon P, et al. (2015) Fundamentals of protein interaction network mapping. Molecular Systems Biology 11 (12): 848.

Stelling J, Sauer U, Szallasi Z, Doyle FJ and Doyle J (2004) Robustness of cellular functions. Cell 118: 675–685.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Brender, Jeffrey R, and Zhang, Yang(Oct 2017) Protein–Protein Interactions and Genetic Disease. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0026856]