Basic and intuitive analysis of microarray datasets using Gitools (Part 1)

After preparing some tutorials for our software, I thought it would be useful to show how basic analyses on microarray data can be carried out using Gitools.

For that, first I needed to find a nice dataset that would serve as an example in Gitools. Gunes pointed me to a recent paper by Hou et al. 2010. The authors of this paper identified interesting gene signatures in non-small cell lung cancer useful for histo-pathological classification and for prediction of clinical outcome. In this study, they profiled 156 lung tumors and adjacent normal lung tissue samples. This is a very interesting dataset and I decided to use it to prepare a series of step by step tutorials using each of the currently available analysis methods in Gitools, namely: Enrichment analysis, Correlations, Oncodrive, Combination of experiments and Overalps.

I started with a matrix of median-centered log-intensity values divided by standard deviation for the 156 samples. First I did pathway enrichment analysis to find pathways that are significantly up or down-regulated in different samples. I used the mean-z-score analysis, which measures the difference between the observed mean expression value in a set of genes as compared to the expected. Positive and negative z-scores indicate that genes in the module have significantly higher or lower expression values in the sample.

The result is a big heatmap with samples as columns and pathways as rows. Each cell contains the result of the z-score analysis for a particular pathway in a sample. The interactive capabilities of Gitools heatmap viewer helps to intuitively interpret the results. For example, after sorting the columns (samples) by histology, I could identify very clear differences between normal and tumour samples. For instance, cell cycle genes tend to have higher expression values in the tumour samples represented in the dataset. While apoptosis genes and genes involved in MAPK signaling pathway tend to have lower expression values compared to normal samples.

On the left, gene expression heatmap. On the right, pathway enrichment heatmap.

You can try to reproduce this same analysis following the step by step tutorial I have prepared. Enjoy it!

In a series of next posts I will explain the other analyses and the results obtained with this dataset.