Molecular subtypes of human cancer

Cancers are typically classified depending on their tissue of origin. However, novel large-scale genomic studies are providing more detailed molecular characterizations of tumors, and thus bring about the possibility of a more accurate classification based on their molecular profiling. Recently, our group has participated in the pan-cancer integrated subtypes study, published online today in Cell, in which a molecular taxonomy of cancer has been addressed by using the comprehensive multi-platform assays provided by the TCGA consortium for 12 diverse cancer types. This study represents an unprecedented effort to classify cancer by refining the molecular portrait of human malignancies.

As a result, 11 molecular subtypes have emerged that mainly reflect the cell of origin of the malignancy, either from a cell type at a specific developmental stage or a cell type with a defined function. On detail, the largest differences were observed between cancers of epithelial and non-epithelial origins, followed by epithelial cancers from basal layer-like cells against those with functions of secretory cells. This lead to tumor samples from the same tissue to split into different subtypes as well as samples from multiple tissues to coalesce in the same group. For instance, several tumor types were grouped in the squamous-like cluster, which appears to arise from an epithelial cell type common to diverse tissues that contain environmentally exposed epithelial surfaces (e.g. oral cavity, lungs, and bladder). On the other hand, bladder cancer was the most heterogeneous disease of the malignancies included in the study, and its samples mainly distributed across three subtypes that correlated with different clinical outcomes. Interestingly, the study also found particular alteration signatures shared by histologically distinct epithelial malignancies that can be exploited by the same therapeutic regimes and confirm that the basal-like breast cancer is a totally independent entity, as distinct to other breast cancer subtypes as it is to tumors from other tissues.

In addition to the results reported in the paper, one of the most interesting contributions of this work is the generated resource of omics data for more than 5000 tumors from 12 different cancer types using multiple platforms. All this data is now accessible into a unified resource in Synapse to support integrative bioinformatics analysis (at https://www.synapse.org/#!Synapse:syn2468297). In addition the results have been made available through several portals to facilitate their navigation, including the UCSC Genome Browser, Gitools, and MD Anderson’s Next Generation Heatmaps.

Our contribution on this part of the project has been to prepare all TCGA pan-cancer-12 datasets used in this study and their subtypes information ready to be navigated with Gitools interactive heatmaps. This data is available for download at http://www.gitools.org/datasets/pancancer12 and can also be opened directly from the web with the latest version of Gitools. In this heatmap (see figure below), the columns are tumor samples, rows are genes and each cell has multiple values indicating the mutation, copy number status of the gene in the tumor sample, and the expression and methylation level. Each sample is annotated with its tissue of origin and molecular subtype (see figure below) among many more annotation values that are available from within the heatmaps. Multiple options of Gitools, such as sort, filter, zoom, search, cluster etc. allows you to navigate interactively this large amount of data to extract meaningful information.

Screenshot of TCGA pancancer 12 dataset in Gitools.

 

If you want to get a glimps on how to navigate TCGA data in Gitools you can look the Gitools Videos, and if you want to learn all you can do with Gitools you can follow the Gitools – From A to Z tutorial.

We hope you find this new cancer genomics resource useful, and please let us know your experience navigating these data.