OncodriveCLUST: a method to reveal cancer drivers based on mutation clustering

We have recently developed a novel method, named OncodriveCLUST, aimed to analyse the mutations observed in sets of tumor samples and identify genes involved in the disease.diana It is based on the feature that driver mutations in cancer genes, especially oncogenes, often cluster in particular positions of the protein. We consider this as a signal that mutations in these regions change the function of the protein in a manner that provides an adaptive advantage to cancer cells and consequently are positively selected during clonal evolution of tumors, and this property can thus be used to nominate driver genes.

Therefore, OncodriveCLUST identifies genes whose mutations are biased towards a larger spatial clustering. The method does not assume that the baseline mutation probability is homogeneous across all gene positions, since it is known that there are non-random mutation processes along the genome and thus such assumption is likely an oversimplication that would introduce bias in the detection of meaningful events. Instead, OncodriveCLUST creates a background model using synonymous mutations. This is based on the hypothesis that coding silent mutations are supposed to be under no selective pressure and may reflect the baseline clustering of somatic mutations.


OncodriveCLUST analysis of the Catalogue of Somatic Mutations in Cancer retrieved a list of genes enriched for the Cancer Gene Census, stressing the ability of the method to select bona fide cancer genes. In addition, and as expected, this list prioritizes genes with dominant phenotype (i.e. oncogenes) but also highlighted some recessive cancer genes, which showed wider but still delimited mutation clusters. Thereafter, we have used OncodriveCLUST to analyse four data sets provided by The Cancer Genome Atlas (breast invasive carcinoma, lung squamous cell carcinoma, ovarian serous carcinoma and uterine corpus endometrioid carcinoma) and we have compared their results with those obtained by two other methods aimed to identify driver genes: MutSig, which identifies genes that are recurrently mutated across cancer samples, and OncodriveFM, which identifies genes bearing mutations causing a bias towards a larger functional impact. OncodriveCLUST identified genes that are well known to be involved in cancer –mainly due to gain-of-function– in all these data sets. Moreover, several of these genes were missed by the remaining methods based on other criteria, stressing the benefit of combining the results of complementary methods to obtain the most reliable and comprehensive catalog of drivers.


In summary, the elucidation of genes involved in cancer is a challenging task that requires the combined use of approaches based on different criteria. In this regard, we demonstrated that OncodriveCLUST complements well other existing methods and should be taken into account for identifying the action of known cancer drivers as well as novel candidates. OncodriveCLUST has been published in Bioinformatics and it is freely available as a Python script. Also OncodriveCLUST can be run online as part of the IntOGen-mutations pipeline.



Tamborero D, Gonzalez-Perez A and Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013; doi: 10.1093/bioinformatics/btt395s


Related posts:

How to identify functional genetic variants in cancer genomes?

How to identify cancer drivers from tumor somatic mutations?