Condel goes Ensembl!

Yes, the Ensembl-variation 62 is out, and Condel has been included in its API. From this version on, the effects of all non-synonymous Single Nucleotide Variants (SNVs) may be assessed employing both SIFT and Polyphen-2, and their scores integrated using Condel.

Although we originally developed Condel (Consensus Deleteriousness score of missense SNVs)  to integrate the output of five computational tools –as explained in a previous post in the blog–, short after submitting the manuscript of the paper we started making tests to evaluate the performance of Condel using only subsets of these tools. We found that the accuracy of a Condel score computed from averaging the weighted scores of SIFT, Polyphen-2 and MutationAssessor was even higher than when the scores of all tools were employed (see the ROC curve below). We figured that these three tools provide a great degree of coherence in the classification of the two datasets we tested.

ROC curves of the three original tools (namely polyphen2, SIFT and MutationAssessor) are dotted lines; the integrated scores of different combinations of tools using condel approach are continuous lines. WAS, Weighed Average Score.

More or less around the same time the Ensembl-variation crew, which had already pre-computed the SIFT and Polyphen-2 scores for the entire human proteome became interested in incorporating the Condel score to their API, in order to provide the user with a simple and reliable way to classify the SNVs as either deleterious or neutral. So, we evaluated the accuracy of a Condel score calculated only from the scores of SIFT and Polyphen-2, and to our surprise, we found that it was close to the Condel score calculated from the original five tools (see ROC curve above). As a result, they implemented the calculation of condel as part of the Ensembl-variation API. So, now either by using the Variant Effect Predictor distributed with the API, or directly at the Ensembl-variation webserver, you may obtain the scores provided by SIFT and Polyphen-2 for any non-synonymous SNV within the human proteome, as well as their integration through a weighted average score using Condel.

Now, just a word about the downside of our findings. The high accuracy attained by Condel using only a subset of two or three of the original tools may be at least in part due to a dataset bias effect. So far, we have only tested Condel using two datasets of several tens of thousands well curated SNVs. The effective classification of larger and more complex datasets may require the use of the scores of the original five tools, or maybe even the incorporation of new tools. We are currently working on assessing the accuracy of Condel in larger SNV datasets, and trying to improve and diversify its abilities by incorporating functional information to the evaluation of deleteriousness.

Anyway, despite this warning, if you are interested in checking whether your SNVs may be involved in the onset of disease, we invite you to try Condel. Using the Ensembl-variation API is now even easier. Explore it, don’t take our word for it! And give us whatever feedback you may think it’s useful. It will certainly be so for us.