How to identify oncogenic driver mutations: a review of bioinformatics approaches

Screen Shot 2014-07-16 at 5.51.18 AMRecently, at least three articles have reviewed the ensemble of bioinformatics tools developed in the past few years to understand the alterations that plague the genome and transcriptome of tumor cells. The first one (in chronological terms), published in Nature Methods one year ago –co-authored by us within the ICGC mutations consequence and pathways working subgroup– focused on the annotation of cancer variants and the tools aimed at the identification of driver mutations and driver genes. The next two, appeared in Genome Medicine and Nature Reviews Genetics covered tools involved across the whole process of cancer mutations analysis, from the calling of somatic mutations to the analysis of significantly mutated pathways or gene modules within genes interactions networks.


We were recently commissioned to write a review of bioinformatics approaches and tools specifically aimed at the identification of driver mutations and genes in tumor genomes, for a special issue focusing on “Cancer Genomics” of the Japanese scientific journal “Experimental Medicine”  organized by Dr.Tatsuhiro Shibata at the National Cancer Center, Tokyo. The article will appear in Japanese, however the editors were kind enough to edit and provide free of charge the English version online.


We decided to structure the first two general segments of the article along two of the main lines of our work in the past few years: the identification of driver mutations and the detection of driver genes. We separate the methods aimed at these two different purposes to avoid frequent confusions: while driver genes are expected to be enriched for driver mutations, the approaches taken to detect the former and the latter are radically different. The search for driver mutations usually revolves around the detection of key amino acid residues for the function of proteins, –often specifically a subset of proteins known or susceptible to be involved in tumorigenesis. On the other hand, the identification of driver genes relies on picking up the traces (or signals) left by positive selection across cohorts of tumor samples. One important distinctions between these two families of methods is therefore that while the former ones can operate (and provide an assessment) at the level of individual mutations, the latter ones provide an assessment of genes based on the collective analysis of all the somatic mutations they bear across the tumor cohort. The most popular tools designed to accomplish these two tasks (and several aimed at detecting significantly mutated sets of functionally interacting genes) are listed and described in the Table 1 of the paper, which we hope may serve as a useful memento to its readers.


In the third section, we review the limitations of the approaches described in the two previous chapters. In brief, while these focus on point mutations (and short indels) in coding regions, many driver alterations –affecting longer portions of the genome or occurring exclusively in non-coding regions– are left uncovered. Finally, we dedicate the fourth section to present our IntOGen-mutations platform and its application to the analysis of somatic mutations identified de novo either in a single tumor genome or across multiple samples in a cohort.


As always, we hope that you find our publications useful for your research and we’re delighted to hear back what you think about ours!