Yesterday the paper describing TCGA Pan-Cancer Project was published in Nature Genetics. We’ve had the opportunity to participate in this exciting project and here I would like to explain our experience and contribution to it.
We have been interested for quite a while in the study of patterns of genomics alterations in cancer across tumor types. Thus a project like the TCGA Pan-Cancer provided a unique opportunity to apply our tools and expertise to a unique collection of data.
In the past few years we have developed computational methodologies to identify cancer drivers by analyzing the patterns of somatic mutations across tumors (i.e OncodriveFM and OncodriveCLUST) as well as tools to facilitate the visual exploration of multidimensional cancer genomics datasets (i.e. Gitools, IntOGen, see our review on this topic if you are interested in this), we now had the opportunity to apply those tools to TCGA Pan-Cancer data.
What does TCGA Pan-Cancer data consist of?
Integrated data set for comparing and contrasting multiple tumor types. Figure from Nature Genetics article. See details in NG.
The TCGA Pan-Cancer project assembled data from more than 3000 patients with primary tumors from different organs, covering 12 tumor types. In each of these tumors a number of omics technologies were applied to obtain complete genomics, transcriptomics, proteomics and epigenomics profiles of the tumors.
Collaboration and teleconferences
David Tamborero, Nuria Lopez-Bigas and Abel Gonzalez-Perez
The TCGA Pan-Cancer collaborative project works through regular teleconferences (usually on Thursdays at 2pm ET) with all the members of the consortium and collaborators. In each teleconference different groups present the results of their analyses. In our case, being in Barcelona, the time of the teleconference was quite inconvenient (8pm) for work-life balance, but alternating between David, Abel and myself we managed to attend most of the teleconferences and to present the progress of our work to other researchers several times.
Data and intermediate results generated by different groups are shared through the Synapse platform. There is a nice paper describing the use of Synapse for the collaborative work within TCGA Pan-Cancer (Omberg et al., Nature Genetics 45, 1125-1126).
We tried, as much as we could, to use our tools and expertise to extract interesting knowledge from the valuable data generated by the TCGA consortium. In total we contributed to the project with 4 different results:
Authors of IntOGen-mutations. From left to right, Michael P. Schroeder, David Tamborero, Nuria Lopez-Bigas, Abel Gonzalez-Perez, Jordi Deu-Pons and Christian Perez-Llamas. Two more authors of IntOGen-mutations missing in the picture are Alba Jene-Sanz and Alberto Santos.
IntOGen-mutations is a web platform for cancer genomes interpretation. It not only analyses TCGA Pan-Cancer data but also additional datasets generated by other initiatives such as those included within the International Cancer Genome Consortium. In the current version users can retrieve driver mutations, genes and pathways acting on 4623 tumors covering 13 cancer sites. They are also able and to analyze newly sequenced tumor genomes and identify relevant mutations by putting them in the context of the accumulated knowledge.
Probably the most interesting feature of IntOGen-mutations is that it provides a comprehensive view of cancer vulnerabilities across cancer types, which was not available before. Tumor re-sequencing projects usually report a list of cancer drivers identified with differing criteria and methodologies, which make it difficult to have a complete view of which genes are drivers in each cancer type. It is now possible to have this comprehensive view with IntOGen-mutations.
We have designed the IntOGen-mutations to be updated regularly and to be scalable to the analysis of much larger cohorts of tumors, so that we can keep up with the expected increase in the number of sequenced tumor genomes/exomes available. Thus with each update we will obtain a more complete view of cancer drivers across tumor types.
To know more about this project you can read this previous Blog Post.
TCGA data visualization using Interactive Heat-maps (Gitools)
One important challenge posed by the large and complex data generated by the TCGA Pan-Cancer project is how to provide access to researchers to explore it and extract useful knowledge from it. We have been working previously on this topic and we propose the use of Interactive Heat-maps (read more on that). We have prepared all TCGA data ready to be navigated with Gitools interactive heat-maps. See video below to learn how to use it.
Comprehensive identification of mutational cancer driver genes
One of the main advantages of analyzing the aggregated data of more than 3000 tumors across 12 tumor types is that it provides increased statistical power to distinguish driver mutations from passenger ones. In collaboration with other researchers of the Pan-Cancer project we have analyzed the mutational patterns of genes across tumors in the search of signals for positive selection that points to candidate cancer drivers. Integrating also additional data generated by the consortium has allowed us to obtain a reliable list of 291 mutational drivers acting in one or more of the 12 cancer types -accounting for 3,205 tumors. We have confirmed and extended the role of known cancer genes and we have identified novel candidates that complete the mutational landscape of these diseases. The article describing this work will be published next week. You can browse the results of this project in IntOGen (at http://www.intogen.org/tcga) and also using Gitools (at http://www.gitools.org/datasets). In addition the results are also available in Synapse (syn1962006).
The mutational landscape of chromatin regulatory factors across 4623 tumor samples
Chromatin regulatory factors are emerging as important genes in cancer development and are regarded as interesting candidates for novel targets for cancer treatment. For this reason, and also due to previous interest in our group, we focused our effort to the study of the mutational landscape of these class of genes. For this we used all TCGA Pan-Cancer mutational data and additional datasets, summing up to 4623 tumors. You can read more on that in a recent post.
The current state-of-the-art of the technology provides an unprecedented opportunity for the understanding of tumor biology. Most importantly, this should eventually lead to develop better treatments for the disease. At this moment, the bottleneck of oncogenomics is not to produce the data but to interpret it in order to retrieve useful knowledge. Initiatives as the Pan-Cancer project succesfully deal with the front-line of these analyses by sifting the important information from the huge amount of alterations that are observed in a tumor cell. Many challenges must be solved before all this information may improve the clinical management of cancer patients. Although many histories of success have been already incorporated to the clinical practice, now we are understanding the complexity of the disease and how many efforts are required to disentangle its mechanisms. It’s a long way to go, but we believe that we are walking in the good direction.