High‐Throughput Automated Subcellular Localisation


Defining the subcellular localisation of the proteome for an organism of interest is a critical next step following genome sequencing. Knowledge of protein subcellular localisation provides insight into the functionality of the normal cell, as well during disease states. However, the presence of gene isoforms, alternative splicing and posttranslational modifications significantly increase the number of protein variants encoded by a single gene, making this a complex task. In the last 20 years, parallel approaches using fractionation and mass spectrometry, synthesis of large libraries of open reading frames fused to genes encoding fluorescent proteins, as well as production of thousands of antibodies have all contributed to the systematic analysis of protein localisation. Alongside these methods, improved bioinformatic predictors, machine learning and deep learning algorithms have also evolved as essential tools. A combinatorial approach of these methods now brings us close to systematically defining the subcellular proteome for many organisms.

Key Concepts

  • Subcellular localisation is a critical determinant in understanding protein function.
  • Data from genome sequencing projects provide the fundamental information from which approaches to understand protein localisation can be initiated.
  • Parallel approaches using fluorescence microscopy are being applied in a high‐throughput manner to systematically reveal the subcellular localisation of large numbers of proteins in different cells.
  • The primary techniques to determine protein localisation are mass spectrometry‐based proteomics, production of antibodies and expression of fluorescently tagged proteins.
  • There is increasing use of computational biology tools to aid the automated classification of subcellular localisation.
  • Large image datasets can be interrogated by machine learning software algorithms to automatically classify proteins to specific localisations.
  • Deep learning methods, which can work independently of training datasets, have become the newest tool to automatically assign protein localisation from image sets.
  • Automated approaches combining both experimental and computational methods are likely to become the primary means by which subcellular localisation is determined from new cell systems.

Keywords: subcellular localisation; proteome; GFP‐tagging; immunofluorescence; high‐throughput automated imaging; high‐content analysis; machine learning; deep learning

Figure 1. Primary methods used to determine protein subcellular localisation. Diagram shows the relationship between the most commonly used wet‐lab and computational methods, for determining the subcellular localisation of proteins in a cell. Created with BioRender.com.
Figure 2. Emerging and future directions for protein subcellular localisation applications. Diagram shows four examples of where protein localisation projects could be applied. From upper left, localisation could be systematically determined from tissue samples (rather than cultured cells); localisation could be determined from cells from a wider range of organisms (such as plants); other imaging modalities such as electron microscopy and super‐resolution microscopy could be used to refine our knowledge of localisation within substructures of compartments; patient samples could be used to assess mislocalisation of proteins, allowing this information to be used to guide personalised medicine regimes. Created with RioRender.com.


Barbe L, Lundberg E, Oksvold P, et al. (2008) Toward a confocal subcellular atlas of the human proteome. Molecular and Cellular Proteomics 7 (3): 499–508.

Bray MA, Singh S, Han H, et al. (2016) Cell painting, a high‐content image‐based assay for morphological profiling using multiplexed fluorescent dyes. Nature Protocols 11 (9): 1757.

Caicedo JC, Cooper S, Heigwer F, et al. (2017) Data‐analysis strategies for image‐based cell profiling. Nature Methods 14: 849.

Cejuela JM, Vinchurkar S, Goldberg T, et al. (2018) LocText: relation extraction of protein localizations to assist database curation. BMC Bioinformatics 19 (1): 15.

Chaturvedi NK, Mir RA, Band V, et al. (2014) Experimental validation of predicted subcellular localizations of human proteins. BMC Research Notes 7 (1): 912.

Chong YT, Koh JL, Friesen H, et al. (2015) Yeast proteome dynamics from single cell imaging and automated analysis. Cell 161 (6): 1413–1424.

Ding DQ, Tomita Y, Yamamoto A, et al. (2000) Large‐scale screening of intracellular protein localization in living fission yeast cells by the use of a GFP‐fusion genomic DNA library. Genes to Cells 5 (3): 169–190.

Emanuelsson O, Brunak S, Von Heijne G, et al. (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nature Protocols 2 (4): 953.

Escobar NM, Haupt S, Thow G, et al. (2003) High‐throughput viral expression of cDNA–green fluorescent protein fusions reveals novel subcellular addresses and identifies unique proteins that interact with plasmodesmata. The Plant Cell 15 (7): 1507–1523.

Feng S, Sekine S, Pessino V, et al. (2017) Improved split fluorescent proteins for endogenous protein labeling. Nature Communications 8 (1): 1–11.

Grys BT, Lo DS, Sahin N, et al. (2017) Machine learning and computer vision approaches for phenotypic profiling. Journal of Cell Biology 216 (1): 65–71.

Guo SM, Veneziano R, Gordonov S, et al. (2019) Multiplexed and high‐throughput neuronal fluorescence imaging with diffusible probes. Nature Communications 10 (1): 1–14.

Hua S and Sun Z (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17 (8): 721–728.

Huh WK, Falvo JV, Gerke LC, et al. (2003) Global analysis of protein localization in budding yeast. Nature 425 (6959): 686–691.

Johnson GR, Li J, Shariff A, et al. (2015) Automated learning of subcellular variation among punctate protein patterns and a generative model of their relation to microtubules. PLoS Computational Biology 11 (12): e1004614.

King BR, Vural S, Pandey S, et al. (2012) ngLOC: software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes. BMC Research Notes 5 (1): 351.

Kraus OZ, Ba JL and Frey BJ (2016) Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32 (12): i52–i59.

Kraus OZ, Grys BT, Ba J, et al. (2017) Automated analysis of high‐content microscopy data with deep learning. Molecular Systems Biology 13 (4): 924.

Lackner DH, Carré A, Guzzardo PM, et al. (2015) A generic strategy for CRISPR‐Cas9‐mediated gene tagging. Nature Communications 6 (1): 1–7.

LeCun Y, Bengio Y and Hinton G (2015) Deep learning. Nature 521 (7553): 436–444.

Leonetti MD, Sekine S, Kamiyama D, et al. (2016) A scalable strategy for high‐throughput GFP tagging of endogenous human proteins. Proceedings of the National Academy of Sciences of the United States of America 113 (25): E3501–E3508.

Lin D, Lin Z, Cao J, et al. (2019) A two‐stage method for automated detection of ring‐like endosomes in fluorescent microscopy images. PLoS One 14 (6): e0218931.

Lundberg E and Borner GHH (2019) Spatial proteomics: a powerful discovery tool for cell biology. Nature Reviews Molecular Cell Biology 20 (5): 285–302.

McQuin C, Goodman A, Chernyshev V, et al. (2018) CellProfiler 3.0: next‐generation image processing for biology. PLoS Biology 16 (7): e2005970.

Mehrle A, Rosenfelder H, Schupp I, et al. (2006) The LIFEdb database in 2006. Nucleic Acids Research 34 (suppl_1): D415–D418.

Newberg JY, Li J, Rao A, et al. (2009) Automated analysis of human protein atlas immunofluorescence images. Proceedings IEEE International Symposium on Biomedical Imaging 5193229: 1023–1026.

Nilsson P, Paavilainen L, Larsson K, et al. (2005) Towards a human proteome atlas: high‐throughput generation of mono‐specific antibodies for tissue profiling. Proteomics 5 (17): 4327–4337.

Orre LM, Vesterlund M, Pan Y, et al. (2019) SubCellBarCode: proteome‐wide mapping of protein localization and relocalization. Molecular Cell 73 (1): 166–182.

Ouyang W, Winsnes CF, Hjelmare M, et al. (2019) Analysis of the human protein atlas image classification competition. Nature Methods 16 (12): 1254–1261.

Pärnamaa T and Parts L (2017) Accurate classification of protein subcellular localization from high‐throughput microscopy images using deep learning. G3: Genes, Genomes, Genetics 7 (5): 1385–1392.

Roberts B, Haupt A, Tucker A, et al. (2017) Systematic gene tagging using CRISPR/Cas9 in human stem cells to illuminate cell organization. Molecular Biology of the Cell 28 (21): 2854–2874.

Salvatore M, Warholm P, Shu N, et al. (2017) SubCons: a new ensemble method for improved human subcellular localization predictions. Bioinformatics 33 (16): 2464–2470.

Shen Y, Ding Y, Tang J, et al. (2019) Critical evaluation of web‐based prediction tools for human protein subcellular localization. Briefings in Bioinformatics. DOI: 10.1093/bib/bbz106.

Simpson JC, Wellenreuther R, Poustka A, et al. (2000) Systematic subcellular localization of novel proteins identified by large‐scale cDNA sequencing. EMBO Reports 1 (3): 287–292.

Spetale FE, Arce D, Krsticevic F, et al. (2018) Consistent prediction of GO protein localization. Scientific Reports 8 (1): 1–12.

Sprenger J, Lynn Fink J, Karunaratne S, et al. (2007) LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Research 36 (suppl_1): D230–D233.

Stadler C, Hjelmare M, Neumann B, et al. (2012) Systematic validation of antibody binding and protein subcellular localization using siRNA and confocal microscopy. Journal of Proteomics 75 (7): 2236–2251.

Stadler C, Rexhepaj E, Singan VR, et al. (2013) Immunofluorescence and fluorescent‐protein tagging show high correlation for protein localization in mammalian cells. Nature Methods 10 (4): 315.

Sullivan DP, Winsnes CF, Åkesson L, et al. (2018) Deep learning is combined with massive‐scale citizen science to improve large‐scale image classification. Nature Biotechnology 36 (9): 820–828.

Thöne FM, Kurrle NS, von Melchner H, et al. (2019) CRISPR/Cas9‐mediated generic protein tagging in mammalian cells. Methods 164: 59–66.

Thul PJ, Akesson L, Wiking M, et al. (2017) A subcellular map of the human proteome. Science 356 (6340): eaal3321.

Tian GW, Mohanty A, Chary SN, et al. (2004) High‐throughput fluorescent tagging of full‐length Arabidopsis gene products in planta. Plant Physiology 135 (1): 25–38.

Uhlén M, Björling E, Agaton C, et al. (2005) A human protein atlas for normal and cancer tissues based on antibody proteomics. Molecular and Cellular Proteomics 4 (12): 1920–1932.

Uhlén M, Oksvold P, Fagerberg L, et al. (2010) Towards a knowledge‐based human protein atlas. Nature Biotechnology 28 (12): 1248–1250.

Uhlén M, Fagerberg L, Hallström BM, et al. (2015) Tissue‐based map of the human proteome. Science 347 (6220): 1260419.

Wiemann S, Arlt D, Huber W, et al. (2004) From ORFeome to biology: a functional genomics pipeline. Genome Research 14 (10B): 2136–2144.

Xu YY, Yang F and Shen HB (2016) Incorporating organelle correlations into semi‐supervised learning for protein subcellular localization prediction. Bioinformatics 32 (14): 2184–2192.

Zheng W and Blake C (2015) Using distant supervised learning to identify protein subcellular localizations from full‐text scientific articles. Journal of Biomedical Informatics 57: 134–144.

Zhu L, Hofestädt R and Ester M (2019) Tissue‐specific subcellular localization prediction using multi‐label markov random fields. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16 (5): 1471–1482.

Further Reading

Bykov YS, Cohen N, Gabrielli N, et al. (2019) High‐throughput ultrastructure screening using electron microscopy and fluorescent barcoding. Journal of Cell Biology 218 (8): 2797–2811.

Chen SC, Zhao T, Gordon GJ, et al. (2007) Automated image analysis of protein localization in budding yeast. Bioinformatics 23 (13): i66–i71.

Coelho LP, Kangas JD, Naik AW, et al. (2013) Determining the subcellular location of new proteins from microscope images using local features. Bioinformatics 29 (18): 2343–2349.

Glory E and Murphy RF (2007) Automated subcellular location determination and high‐throughput microscopy. Developmental Cell 12 (1): 7–16.

Horton P, Park KJ, Obayashi T, et al. (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Research 35 (suppl_2): W585–W587.

Klausen MS, Jespersen MC, Nielsen H, et al. (2019) NetSurfP‐2.0: improved prediction of protein structural features by integrated deep learning. Proteins: Structure, Function, and Bioinformatics 87 (6): 520–527.

Liu G, Zhang WB, Qian G, et al. (2019) Bioimage‐based prediction of protein subcellular location in human tissue with ensemble features and deep networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics. DOI: 10.1109/TCBB.2019.2917429.

Rhee SY, Birnbaum KD and Ehrhardt DW (2019) Towards building a plant cell atlas. Trends in Plant Science 24 (4): 303–310.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Chalkley, Alannah S, Kelly, Suainibhe, Mysior, Margaritha M, and Simpson, Jeremy C(Aug 2020) High‐Throughput Automated Subcellular Localisation. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0020868]