- Chaining Sequence/Structure Seeds for Computing RNA Similarity
Bourgeade Laetitia, Julien Allali and Cédric Chauve
Motivation: The increasing understanding of the important roles played by non-coding RNAs in many cellular processes has motivated an intensive research on the accurate and efficient comparison of large sets of RNA genes using both sequence and secondary structure information.
Results: We describe a new method to compare a query RNA with a static set of target RNAs. Our method is based on (i) a static indexing of the sequence/structure seeds of the target RNAs, (ii) searching the target RNAs by detecting seeds of the query present in the target, chaining these seeds in promising candidate homologs, then (iii) completing the alignment using an anchor-based exact alignment algorithm. We apply our method on the benchmark Bralibase 2.1 and compare its accuracy and efficiency with the exact method LocaRNA and its recent seeds-based speed-up ExpLoc-P. Our pipeline RNA-unchained computes constraints that greatly improve computation time of LocaRNA and is comparable to the one of ExpLoc-P, while improving the overall accuracy of the final alignments.
Availability: RNA-unchained is available at http://bioinfo.labri.fr/tools/RNA-unchained
- Ab initio Prediction of RNA Nucleotide Interactions with Backbone k-Tree Model
Liang Ding, Xingran Xue, Sal Lamarca, Mohammad Mohebbi, Abdul Samad, Russell L. Malmberg and Liming Cai
Recently there have been many revelations of importance of RNAs to cellular functions and thus fast-growing interests in computational prediction of RNA tertiary, which is a significant challenge nevertheless. Even for a short RNA sequence, the space of tertiary conformations is immense; existing methods to identify native-like conformations mostly resort to random sampling of conformations for computation feasibility. To avoid compromise in accuracy, such methods may also be guided with knowledge of local tertiary motifs or secondary structure. However, due to lack of effective, global constraints on the RNA tertiary structure, existing methods have yet to deliver the desired prediction accuracy for RNAs of length beyond 50.
In this work, we approach the ab initio tertiary structure prediction problem with a key step to predict nucleotide interactions that constitute the desired tertiary structure. Our work is established on a novel graph model, called k-tree, to constrain nucleotide interaction relationships in RNA tertiary structure. We show that, effectively guided by the k-tree model, a set of nucleotide interactions can be optimally and efficiently predicted from the query sequence. Our work also benefits from recent studies that revealed intrinsic and detailed roles of nucleotide interactions in the tertiary structure, including the fine-grained geometric nomenclatures and rich families of nucleotide interactions. Test results demonstrate that our method can predict with a high accuracy nucleotide interactions that constitute the tertiary structure of the query sequence, rendering a feasible solution toward ab initio prediction of tertiary structures.
- Observations on the Feasibility of Exact Pareto Optimization
Saule Cedric and Giegerich Robert
Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all candidates whose scores are not dominated by others in at least one objective. This gives, in a precise sense, better information that an artificial amalgamation of different scores into a single objective, but is more costly to compute.
We define a general Pareto product operator *Par on scoring schemes. Independent of a particular algorithm, we prove that for two scoring schemes A and B used in dynamic programming, the scoring scheme A *Par B correctly performs Pareto optimization over the same search space. We show that a Pareto-eager implementation of dynamic programming can achieve the same asymptotics as a single-objective optimization which computes the same number of results. For a concrete application in RNA structure prediction, we show that the empirical size of the Pareto front remains within reasonable bounds. Without artificial amalgamation of objectives, and with no heuristics involved, Pareto optimization is faster than computing the same number of answers separately for each objective.
- Searching for alternate RNA structures in genomic sequences
Azadeh Saffarian, Mathieu Giraud and Helene Touzet
We introduce the concept of RNA multi-structures, that is a formal grammar-based framework specifically designed to model a set of alternate RNA secondary structures. Such alternate structures can either be a set of suboptimal foldings, or distinct stable folding states, or variants within an RNA family. We provide several such examples and propose an efficient algorithm to search for RNA multi-structures within a genomic sequence.
Jan Gorodkin, Center for Non-Coding RNA in Technology and Health, University of Copenhagen, Danemark
Searching for SNPs disrupting RNA secondary structures
Single Nucleotide Polymorphisms (SNPs) can have large impact on diseases as well as phenotypic traits. Traditionally, SNPs have been studied in protein coding sequence and lately also in regulatory elements such as transcription factor binding sites. Since phenotypic SNPs are widespread in the genome it is of equal interest to search for their impact everywhere including in RNA structure in transcriptomic sequence. Studying the potential impact of, for example, SNPs in coding sequence takes outset in non-synonymous changes and these have then further been used to study structure disruptions which then again are used to imply functional changes. In contrast, studying SNPs for structure disrupting potential in RNA is more complex, because longer range base pairings often are involved. A number of strategies have been employed to address this, but they have mainly considered the RNA sequence globally, and thus local changes in large sequence can be harder to detect. We address this by constructing an approach, RNAsnp, which considers the sequences locally from globally computed base pair probabilities in either the full sequence or in sliding windows. Our approach compares the wild-type and mutant sequences and search for the region which maximizes the difference in base pair probabilities using a given distance measure. Furthermore, we compute mutation effects by empirical p-values. On the analysis of disease associated SNPs in UTRs we obtain substantially more candidates (20 vs. 3) than obtained by a global strategy on a set of 501 diseases associated SNPs. In a further study of cancer associated Single Nucleotide Variants (SNVs), we combined prediction of disrupted local RNA secondary structure and microRNA targets. We analyzed existing transcriptome data from patients with non-small cell lung cancer (NSCLC). In the original set, aimed at finding non-synomous SNVs, ~40% of the in total (somatic and germ-line) 73,717 SNVs overlap UTRs. Of 29290 SNVs in UTRs of 6462 genes, we predict 962 (408, local RNA structure; 490, miRNA targets) disruptive SNVs in 803 different genes. Of these 188 (23.4%) were previously known to be cancer associated, which is significantly higher (p=0.032) than the ratio of 1347 of 6462 in the full data set. This analysis can furthermore be used for network analysis indicating where the disruptive SNVs appear. RNAsnp is available as standalone software and as webserver at http://rth.dk/resources/rnasnp.
Mihaela Zavolan, Biozentrum, University of Basel, Switzerland
Deciphering the regulatory functions of miRNAs
miRNAs are small RNAs that guide Argonaute proteins to target mRNAs through perfect complementarity involving 7-8 nucleotides of the miRNA’s 5’end. These ‘canonical’ targets typically undergo degradation and translational inhibition. However, recent studies have suggested that miRNA target degradation can give rise to additional behaviors. These unclude the threshold-linear response of the targets to their transcriptional induction, reduction of the ‘noise’ in target expression and induction of correlations in the expression of the targets of a given miRNA. Here I will discuss our combined, experimental and computational, approaches towards predicting miRNA targets and characterizing the functional impact of miRNAs.