Next‐Generation Sequencing in Cancer: Tools for Fusion Gene Detection


Next‐generation sequencing (NGS) technology has a striking impact on genomics research, especially in complex diseases, including cancer. NGS technology is allowing the discovery of novel genomic alterations and screening for known driver mutations and structural alterations as well as understanding interactions between gene and environmental factors. Structural alterations including translocations inversions, tandem duplications, deletions, insertions, chromothripsis and chromoplexy can result in oncogenic fusion genes or fusion transcripts, which are important in carcinogenesis. Rapid advances in NGS technology and analytical tools have allowed them to be used routinely in molecular diagnosis and personalised therapy. However, NGS technologies have significant challenges, such as analysis time, false‐positive rate and storage to be overcome in the near future. This article discusses the current analytical tools for detection of fusion genes and reviews their pros and cons.

Key Concepts

  • Fusion genes/transcripts are common events in epithelial tumours.
  • Whole‐transcriptome sequencing (RNA‐seq) is the most commonly selected method for fusion‐transcript detection.
  • Sequel algorithms are used for detection of fusion transcripts.
  • Mapping algorithms and filtering are crucial for detection of true fusion transcripts.
  • Current fusion‐detection tools yield false‐positive and false‐negative results.

Keywords: next‐generation sequencing; fusion transcript; RNA‐seq; whole‐genome sequencing; fusion‐detection tools


Abate F, Acquaviva A, Paciello G, et al. (2012) Bellerophontes: an RNA‐Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model. Bioinformatics 28: 2114–2121.

Abel HJ, Al‐Kateb H, Cottrell CE, et al. (2014) Detection of gene rearrangements in targeted clinical next‐generation sequencing. Journal of Molecular Diagnostics 16: 405–417.

Ameur A, Wetterbom A, Feuk L and Gyllensten U (2010) Global and unbiased detection of splice junctions from RNA‐seq data. Genome Biology 11: R34.

Asmann YW, Hossain A, Necela BM, et al. (2011) A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines. Nucleic Acids Research 39: e100.

Au KF, Jiang H, Lin L, Xing Y and Wong WH (2010) Detection of splice junctions from paired‐end RNA‐seq data by SpliceMap. Nucleic Acids Research 38: 4570–4578.

Benelli M, Pescucci C, Marseglia G, et al. (2012) Discovering chimeric transcripts in paired‐end RNA‐seq data by using EricScript. Bioinformatics 28: 3232–3239.

Bruno AE, Miecznikowski JC, Qin M, Wang J and Liu S (2013) FUSIM: a software tool for simulating fusion transcripts. BMC Bioinformatics 14: 13.

Bryant DW Jr Shen R, Priest HD, Wong WK and Mockler TC (2010) Supersplat – spliced RNA‐seq alignment. Bioinformatics 26: 1500–1505.

Chen K, Wallis JW, Kandoth C, et al. (2012) BreakFusion: targeted assembly‐based identification of gene fusions in whole transcriptome paired‐end sequencing data. Bioinformatics 28: 1923–1924.

Chen K, Wallis JW, McLellan MD, et al. (2009) BreakDancer: an algorithm for high‐resolution mapping of genomic structural variation. Nature Methods 6: 677–681.

Clement NL, Snell Q, Clement MJ, et al. (2010) The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next‐generation sequencing. Bioinformatics 26: 38–45.

Cloonan N, Xu Q, Faulkner GJ, et al. (2009) RNA‐MATE: a recursive mapping strategy for high‐throughput RNA‐sequencing data. Bioinformatics 25: 2615–2616.

Dimon MT, Sorber K and DeRisi JL (2010) HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA‐Seq data. PLoS One 5: e13875.

Dobin A, Davis CA, Schlesinger F, et al. (2013) STAR: ultrafast universal RNA‐seq aligner. Bioinformatics 29: 15–21.

Francis RW, Thompson‐Wicking K, Carter KW, et al. (2012) FusionFinder: a software tool to identify expressed gene fusion candidates from RNA‐Seq data. PLoS One 7: e39987.

Ge H, Liu K, Juan T, et al. (2011) FusionMap: detecting fusion genes from next‐generation sequencing data at base‐pair resolution. Bioinformatics 27: 1922–1928.

Gotoh O (1982) An improved algorithm for matching biological sequences. Journal of Molecular Biology 162: 705–708.

Grant GR, Farkas MH, Pizarro AD, et al. (2011) Comparative analysis of RNA‐Seq alignment algorithms and the RNA‐Seq unified mapper (RUM). Bioinformatics 27: 2518–2528.

Homer N, Merriman B and Nelson SF (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS One 4: e7767.

Hsu F, Kent WJ, Clawson H, et al. (2006) The UCSC known genes. Bioinformatics 22: 1036–1046.

Huang S, Zhang J, Li R, et al. (2011) SOAPsplice: genome‐wide ab initio detection of splice junctions from RNA‐seq data. Frontiers in Genetics 2: 46.

Iyer MK, Chinnaiyan AM and Maher CA (2011) ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics 27: 2903–2904.

Jean G, Kahles A, Sreedharan VT, De Bona F and Ratsch G (2010) RNA‐Seq read alignments with PALMapper. Current Protocols in Bioinformatics Chapter 11, Unit 11 6.

Jia W, Qiu K, He M, et al. (2013) SOAPfuse: an algorithm for identifying fusion transcripts from paired‐end RNA‐Seq data. Genome Biology 14: R12.

Kangaspeska S, Hultsch S, Edgren H, et al. (2012) Reanalysis of RNA‐sequencing data reveals several additional fusion genes with multiple isoforms. PLoS One 7: e48745.

Kim D and Salzberg SL (2011) TopHat‐Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biology 12: R72.

Kinsella M, Harismendy O, Nakano M, Frazer KA and Bafna V (2011) Sensitive gene fusion detection using ambiguously mapping RNA‐Seq read pairs. Bioinformatics 27: 1068–1075.

Korbel JO, Abyzov A, Mu XJ, et al. (2009) PEMer: a computational framework with simulation‐based error models for inferring genomic structural variants from massive paired‐end sequencing data. Genome Biology 10: R23.

Langmead B and Salzberg SL (2012) Fast gapped‐read alignment with Bowtie 2. Nature Methods 9: 357–359.

Langmead B, Trapnell C, Pop M and Salzberg SL (2009) Ultrafast and memory‐efficient alignment of short DNA sequences to the human genome. Genome Biology 10: R25.

Lee WP, Stromberg MP, Ward A, et al. (2014) MOSAIK: a hash‐based algorithm for accurate next‐generation sequencing short‐read mapping. PLoS One 9: e90581.

Li H and Durbin R (2009) Fast and accurate short read alignment with Burrows‐Wheeler transform. Bioinformatics 25: 1754–1760.

Li H, Ruan J and Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18: 1851–1858.

Li R, Yu C, Li Y, et al. (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967.

Li Y, Chien J, Smith DI and Ma J (2011) FusionHunter: identifying fusion transcripts in cancer using paired‐end RNA‐seq. Bioinformatics 27: 1708–1710.

Liao Y, Smyth GK and Shi W (2013) The Subread aligner: fast, accurate and scalable read mapping by seed‐and‐vote. Nucleic Acids Research 41: e108.

Lin H, Zhang Z, Zhang MQ, Ma B and Li M (2008) ZOOM! Zillions of oligos mapped. Bioinformatics 24: 2431–2437.

Liu C, Ma J, Chang CJ and Zhou X (2013) FusionQ: a novel approach for gene fusion detection and quantification from paired‐end RNA‐Seq. BMC Bioinformatics 14: 193.

Lou SK, Ni B, Lo LY, et al. (2011) ABMapper: a suffix array‐based tool for multi‐location searching and splice‐junction mapping. Bioinformatics 27: 421–422.

Lunter G and Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Research 21: 936–939.

McPherson A, Hormozdiari F, Zayed A, et al. (2011a) deFuse: an algorithm for gene fusion discovery in tumor RNA‐Seq data. PLOS Computational Biology 7: e1001138.

McPherson A, Wu C, Hajirasouliha I, et al. (2011b) Comrad: detection of expressed rearrangements by integrated analysis of RNA‐Seq and low coverage genome sequence data. Bioinformatics 27: 1481–1488.

McPherson A, Wu C, Wyatt AW, et al. (2012) nFuse: discovery of complex genomic rearrangements in cancer using high‐throughput sequencing. Genome Research 22: 2250–2261.

Philippe N, Salson M, Commes T and Rivals E (2013) CRAC: an integrated approach to the analysis of RNA‐seq reads. Genome Biology 14: R30.

Piazza R, Pirola A, Spinelli R, et al. (2012) FusionAnalyser: a new graphical, event‐driven tool for fusion rearrangements discovery. Nucleic Acids Research 40: e123.

Pruitt KD, Tatusova T and Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non‐redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 35: D61–D65.

Robertson G, Schein J, Chiu R, et al. (2010) De novo assembly and analysis of RNA‐seq data. Nature Methods 7: 909–912.

Rumble SM, Lacroute P, Dalca AV, et al. (2009) SHRiMP: accurate mapping of short color‐space reads. PLoS Computational Biology 5: e1000386.

Sboner A, Habegger L, Pflueger D, et al. (2010) FusionSeq: a modular framework for finding gene fusions by analyzing paired‐end RNA‐sequencing data. Genome Biology 11: R104.

Sedlazeck FJ, Rescheneder P and von Haeseler A (2013) NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29: 2790–2791.

Smith TF and Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147: 195–197.

Sun R, Love MI, Zemojtel T, et al. (2012) Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single‐end reads. Bioinformatics 28: 1024–1025.

Trapnell C, Pachter L and Salzberg SL (2009) TopHat: discovering splice junctions with RNA‐Seq. Bioinformatics 25: 1105–1111.

Wang J, Mullighan CG, Easton J, et al. (2011) CREST maps somatic structural variation in cancer genomes with base‐pair resolution. Nature Methods 8: 652–654.

Wang K, Singh D, Zeng Z, et al. (2010) MapSplice: accurate mapping of RNA‐seq reads for splice junction discovery. Nucleic Acids Research 38: e178.

Wu J, Zhang W, Huang S, et al. (2013) SOAPfusion: a robust and effective computational fusion discovery tool for RNA‐seq reads. Bioinformatics 29: 2971–2978.

Ye K, Schulz MH, Long Q, Apweiler R and Ning Z (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired‐end short reads. Bioinformatics 25: 2865–2871.

Zeitouni B, Boeva V, Janoueix‐Lerosey I, et al. (2010) SVDetect: a tool to identify genomic structural variations from paired‐end and mate‐pair sequencing data. Bioinformatics 26: 1895–1896.

Further Reading

Edwards PA (2010) Fusion genes and chromosome translocations in the common epithelial cancers. Journal of Pathology 220: 244–254.

Medves S and Demoulin JB (2012) Tyrosine kinase gene fusions in cancer: translating mechanisms into targeted therapies. Journal of Cellular and Molecular Medicine 16: 237–248.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Tuna, Musaffe(May 2015) Next‐Generation Sequencing in Cancer: Tools for Fusion Gene Detection. In: eLS. John Wiley & Sons Ltd, Chichester. [doi: 10.1002/9780470015902.a0025848]