How to perform a hierarchical clustering using interactive heatmaps in Gitools

In the latest version of Gitools, version 2.1, we have improved the clustering of heatmaps. Here we explain in detail on how to perform and interpret the hierarchical clustering result – and why it is a bit different than the rest.

Hierarchical clustering in Gitools: The lines in the header represent the hierarchical tree splitting, the root at the bottom, the leafs at the top

Hierarchical clustering in Gitools: The lines in the heatmap header represent the hierarchical tree (Dendrogram) splitting at different levels. The root of the tree is located at the bottom, the leafs at the top. See video at YouTube

Perform a hierarchical clustering

Once we are viewing a data heatmap on which we would like to perform a clustering, we can achieve this by selecting the menu Analysis > Clustering. A wizard dialog will pop up that asks which clustering and how we want to perform.

See a short video on hieararchical clustering at YouTube:

In the first step we want to select hierarchical clustering and if we want to cluster columns or rows. Furthermore if we were not viewing the values we want to cluster, we change the “Take values from” selection.

The second step gives us the chance to modify the distance measurement. If you do not have any special requirements, go ahead and directly hit the Finish button. After a moment the clustering should be finished and the hierarchical tree visible, explained in detail in the next section.

The result: Hierarchical tree, heatmap header, heatmap order and bookmark

One challenging issue with clustering in Gitools was how to show the clustering results while maintaining the interactive capabilities of the heatmap. We solved this by showing the hierarchical organization of the columns or rows as colored bars (see figure above). In addition the Newick tree is also provided.

The result of the hierarchical clustering are four things:

  • Hierarchical tree or Dendrogram. The tree is painted as a static image in a new tab. This image can be exported to an image via File > Export Hierarchical tree as image.
  • A Heatmap header. As header of the heatmap, 10 levels of the hierarchical tree are added.
  • Heatmap order. The order of the heatmap is changed according to clustering. While it respects the order of the splits, we calculate the in which order the different leafs of the tree should go by a density function. This guarantees that different clusters (or leaves) which are more similar to each other are placed one besides the next avoiding creating a perception of “artificially distinct” clusters.
  • A Bookmark. After after applying the clustering order we add a one-dimensional bookmark to Gitools which contains the exact order of the clustered dimension (rows or clumns). In case the heatmap order has been changed this bookmark can be applied

How to interpret the hierarchical trees as colored bars/clusters

The header that has been added to the heatmap is a summary of the hierarchical tree. In the image above a hierarchical clustering header for the columns is seen. By default 10 different levels of the hierarchical tree are being displayed. Closest to the heatmap, at the bottom, the root level of the tree is shown. Each level towards the contains more splits.

Some properties to consider for interpretation the above image:

  • Broad and detailed clusters. In the image above, each horizontal line represents the split branches of the hierarchical tree at different levels. You may decide which level(s) works best for you, depending on how many clusters you want to have. The more levels you move up, the more fine-grained the clusters are.
  • Outliers columns become leafs fast. That means that after very few splits they are considered as clustered and therefore receive not any more clusters in upper levels. In the heatmap header this is reflected as white space. The left-most columns in the image, have become leafs after level 5.
  • Peaks represent regions of higher similarity. As outliers soon receive no clusters, the columns with many clusters assigned (horizontal bars at many levels) are the ones that are more difficult to tell apart for the algorithm. As a consequence peaks or high plateaus form. In some cases these may be useful to tell apart with great detail two columns, in other cases the similarity is high enough so this information can be overlooked.

And remember, the whole heatmap with the hiearchical clustering header can be exported to an image via Export > Save heatmap to image.