The making of Condel (CONsensus DELeteriousness Score)

//The making of Condel (CONsensus DELeteriousness Score)

The making of Condel (CONsensus DELeteriousness Score)

Our article on a Consensus deleteriousness score of missense Single Nucleotide Variants was published yesterday online and will be included in the April issue of the American Journal of Human Genetics. We want to invite you to read it, and give you a glimpse at how we got the idea.

A few months back we started searching for computational methods to assess the effect of missense Single Nucleotide Variants (SNVs) on proteins. Our primary aim was to identify an array of methods that could be used by projects within the International Cancer Genome Consortium to help prioritize deleterious SNVs from the large collections that usually appear in cancer samples. We were looking for computational tools that fulfilled two main requirements: they had to be easily downloaded and installed locally, and they should perform well in the task of separating likely deleterious from likely neutral SNVs. While the first condition was easy to assess, we realized that the latter was rather cumbersome. There were no common benchmarking studies of the performance of different methods, and most of them had been tested on different datasets of experimentally known deleterious and neutral SNVs.

Therefore, our first idea was to benchmark all downloadable tools with a common dataset of SNVs. We chose a couple of datasets of several thousands of disease-related and non-damaging SNVs that had been culled by the authors of one of the methods we were interested in benchmarking. We used five tools that assess the probability that an aminoacid change be accepted in evolution to calculate the deleteriousness of each single SNV in the two datasets.

At some point of the process, we realized that since practically all SNVs were classified by at least three tools, we could implement a method that combined their classification in order to obtain a more accurate assessment of the deleteriousness of each SNV. We essayed several ways to integrate the outputs of the five tools, and finally found that a weighted average of the scores of the individual tools increased the accuracy of the classification of both datasets of SNVs to values around 90%, as show the ROC curves below. This weighted average of the scores of different tools may be regarded as a measurement of the degree of coherence of individual methods about the likelihood that a SNV is deleterious. We have therefore named it Consensus deleteriousness score of SNVs, or Condel.

In the figure the ROC curves that correspond to the five original tools (namely polyphen2, SIFT, LogR PFam E-value, MutationAssessor and MAPP)  are dotted lines; the integrated scores are continuous lines. PPH2, polyphen2; logre, LogR PFam E-value ; massess, Mutation Assessor; SVS, Simple Vote Score; WVS, Weighed Vote Score; SAS, Simple Average Score; WAS, Weighed Average Score.

At the end of the making, we realized that the rationale behind Condel could be applied in principle to any such array of methods that assess the likelihood of deleteriousness of SNVs. As a matter of fact, different arrays of methods may work better on different datasets.

If this story interested you, please don’t forget to visit Condel at http://bg.upf.edu/condel.

By | 2011-04-01T07:55:26+00:00 April 1st, 2011|Categories: BG News|Tags: , , , |4 Comments

About the Author:

4 Comments

  1. Brad Chapman April 1, 2011 at 11:20 am - Reply

    Abel;
    Congratulations on the new paper; Condel sounds like very useful consensus
    method. Is there software available to utilize the method? Thanks much

    • abel April 3, 2011 at 5:19 pm - Reply

      Thank you, Brad.
      You can download a tarball containing a PERL script that implements the weighted average score calculation, and the files with the complementary cumulative distribution of the scores of SNVs in HumVar needed to compute the weights from our webserver (http://bg.upf.edu/condel). Also, a variant of condel integrating only SIFT and polyphen2 will be incorporated to version 62 of the Ensembl-variation API, to be released this month.

  2. Mark Aquino May 21, 2011 at 12:33 am - Reply

    First of all id like to say Great work. I am trying to combine the sift pph2 scores with mutation assessor scores but I haven’t been able to find a way to download their database or the script they use to compute the scores so I can perform this locally. Any help you could provide with this would be appreciates greatly.

  3. abel May 23, 2011 at 4:04 pm - Reply

    Hi Mark,

    I assume you’re talking about MutationAssessor. Actually, right now the only way to obtain their scores is to submit your SNPs in batch mode to their server (http://mutationassessor.org) and retrieve the results through their WebAPI. It’s all explained in their website, and it works really nice. Thank you very much for your comment.

    Abel

Leave A Comment Cancel reply