Yesterday night, our IntOGen-mutations paper went online in Nature Methods. A couple of years of hard work of our team lies now condensed into a couple of pages. Let us use this blog to unfold the story so tightly wrapped up within those two pages.
It all started few years ago with the development of the first IntOGen (Nature Methods in 2010), which focuses on the analysis of genes and pathways affected by expression and copy number changes in tumors across projects and cancer types. We had decided to extend this analysis to tumor somatic mutations, following a similar concept. Therefore, we started by developing OncodriveFM algorithm to detect genes that accumulate functional mutations across tumor samples. To validate the algorithm we tested it on the glioblastoma and ovarian carcinoma datasets published by TCGA and on the chronic lymphocytic leukemia published by the ICGC. We were ready to continue analyzing other publicly available somatic mutations datasets and compile all candidate driver genes picked up by OncodriveFM in different malignancies into an unbiased catalog of cancer genes. I then started the job of collecting the datasets and writing the scripts (I was still a PERList at the time) to transform the list of mutations into functional impact scores that could be used by OncodriveFM. The task required transforming all lists of mutations to the coordinates of the hg19 assembly of the human genome (thank you, UCSC, for all the liftover!) and welding together the Ensembl Variant Effect Predictor and tools that predict the functional impact of mutations.
At first sight, the job seemed like a piece of cake: but like always, the devil hid behind the details. I kept being bugged by small abnormalities in the intermediate files of the incipient pipeline that could be traced back to minor differences in annotation formats employed by different projects. Some of them, for example didn’t annotate the strand of the mutation; different notations were used to code for indels. Eventually, I became overwhelmed with the task of coding the pipeline with anything close to execution efficiency, and it became crystal clear that the project required the deeds of a software engineer. And that is how, first Alberto, and then Christian got involved in the project with the task of developing an pipeline based in Wok to efficiently analyze dataset of cancer somatic mutations and detect in them putative driver genes. Once Christian finally had the pipeline up and running it was easy to start incorporating new analysis and pieces of software to what we were beginning to call IntOGen-mutations. The most important was OncodriveCLUST, a new member of the family of drivers nominators. Finally, it was time to work on the visualization of the data generated by the pipeline. Jordi took care of it with Onexus, and patiently accommodated every one of our endless requests, big and small, to make the interface simpler and friendlier.
Some views of IntOGen-mutations browser
So, in the end, we finished the platform, called IntOGen-mutations, which is composed of an automatic pipeline to analyze somatic mutations detected either in a single patient or across a cohort of tumors, and a web discovery tool containing the results of analyzing mutations across 4623 tumor samples employing that pipeline. The web discovery tool, thus contains information on the genes and pathways that drive tumorigenesis in tumor types from 13 anatomical sites. The platform may be useful, primarily to three types of research:
- First, cancer researchers with interest in one gene or group of genes, may query the web discovery tool (see video tutorial) and find out tumor types where they act as drivers
- Second, groups sequencing cohorts of cancer genomes may use the pipeline to detect genes that act as drivers in their tumor samples, browse the results through a private IntOGen-like website (see video tutorial), and compare them to the knowledge accumulated in the web discovery tool
- Third, clinically-oriented researchers who sequence the tumor of a patient, may use the pipeline to rank its mutations by putative functional impact, browse the results through a private IntOGen-like website (see video tutorial), and compare them to the knowledge accumulated in the web discovery tool
So, without further ado, we give you IntOGen-mutations where you can browse a catalog of cancer somatic mutations, analyze your own datasets, check plots and visualize as heatmaps. Enjoy!
How to identify functional genetic variants in cancer genomes?
How to identify cancer drivers from tumor somatic mutations?