Basic and intuitive analysis of microarray datasets using Gitools (Part 2)

In this series of posts I am showing how expression data can be analyzed using Gitools. In a previous post I explained how to do pathway enrichment analysis. Here I explain how to identify genes significantly up-regulated in a cancer dataset using Oncodrive. This is part of a step by step tutorial that I recently prepared for Gitools users.

For this analysis I used the same dataset that was introduced in the previous post, namely a microarray dataset profiling 156 lung tumors and adjacent normal lung tissue samples by Hou et al. 2010. However, in this case I preprocessed the it to have a matrix of log2 ratios between tumor and normal samples.

As my objective was to identify genes that are significantly up-regulated in this experiment, I used Oncodrive. This is a simple statistical method that, given a matrix of alterations for genes and samples, assesses for each gene if it is altered in more samples than expected by chance. This is the method that we use in IntOGen to identify significantly altered genes. The result is a p-value per gene that indicates if the gene is significantly altered (up-regulated in this case).

I applied this method to the matrix of log2 ratios of Hou et al. 2010, considering up-regulation any fold change higher than 1.297. This is the optimal cutoff for this dataset obtained as explained in the supplementary material of IntOGen paper. As a results I obtained a Gitools matrix with a single column with a p-value per gene. I found, for example, that CSK2 and SOX4 genes are signifincantly up-regulated, and I know that those two genes are also up-regulated in other lung cancer experiments in IntOGen.

You can try to reproduce this same analysis following this step by step tutorial. Enjoy it!

In a next post I plan to explain how to compare if the genes identified as significantly up-regulated in this dataset are also up-regulated in other experiments from IntOGen.