Non_synonymous SNPs that modify conserved residues
Non synonymous SNP is a SNP in where the variation produces different protein sequences.
Omega value, non-synonymous/synonymous (dN/dS) substitution rate, is an estimation of selective pressure on an amino acid replacement mutations for protein coding genes. Omega values that are less than 0.1 are selected as putatively pathological SNPs, that most likely effect protein function.
Estimates of selective pressures at a codon level are obtained through two different methods:
-
Omega-bay values: calculated using codon-based maximum likelihood models, implemented in the condeml program of PAML package.
Two models are used for computation: M2 & M8 which assume different distributions for omega classes.
In both cases the estimation of codon omega-bay values is done through the Bayes Empirical Bayes approach.
- Omega-slr value: are obtained by the use of the Slr program by Massingham T. et al..
Slr is site-wise likelihood ratio method for defining the selective pressure acting at amino acid sites.
SNPs in rat genes orthologous of human disease or cancer genes
Rat and human orthologus genes, were obtained from Ensembl Compara database. These orthologous genes were matched against human disease and cancer genes. Finally all non intergenic SNPs discovered under the STAR project were mapped into rat orthologs of human disease and cancer genes.
SNPs that create new splice sites
All non intergenic SNPs were scored with GeneID searching for creation of new donor (dinucleotide GT) or acceptor (dinucleotide AG)splice sites. GeneID is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure.
In the first step, Position Weight Arrays (PWAs) are used to score splice sites, start and stop codons. In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons.
SNPs that effect microRNA targets
The microRNA target prediction program miRanda was used to scan rat microRNAs (miRBase) against all the SNPs in 3'UTR regions. miRanda is an algorithm that aims to predict microRNA targets using dynamic programming algorithm and thermodynamics.
SNPs that effect conserved TFBS
Promoter sequences are functional regions located upstream (200-2000 nt long) of transcription start site of the gene (TSS). Transcription Factors (TF) binds Transcription factors Binding sites (TFBS), specific motifs in the DNA (usually 5-15 nt). TF can bind to more than one TFBS and recruit RNA_polymerase II. The promoter regions from human and rat orthologous genes are obtained by extracting 1000 bps upstream from TSS (Ensembl). JASPAR 1.0 collection of matrices (PWMs for TFBS) is used to obtain the corresponding TF-maps for each gene (human and rat ortho) and detect TFBSs. Cross-species promotor Meta-alignments between the maps of each pair of orthologous human-rat genes are produced. SNPs that are overlapping conserved TFBSa are considered to have a putative effect in the expression of the gene.
Related article: Blanco et al. 2006
SNPs in promoter DNA Triplexes
DNA triplexes are formed when a polypurine-rich DNA duplex binds a single-stranded polynucleotide. Sequences longer that 10 polypurines (A;G) or polypyrimidines (T;C) are considered potential Triplex Target Sequences. SNPs located located in DNA triplexes are believed to effect the triplex formation and disrupt the gene regulation.
Related article: Goni et al. 2004