Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Learning Heuristics for Mining RNA Sequence-Structure Motifs 25



  1. We present a novel approach the task of mining RNA sequence-structure motifs
    with possible pseudoknots.

  2. While previous approaches artificially divide the mining task into conserved
    stem identification followed by the assembly of such stems into complex
    structural motifs, we present thefirstapproach for preforming the two tasks
    simultaneously. We believe this unified approach offers a more informative
    model for deciphering the evolution of functional RNAs, where the sets of stems
    comprising a conserved motif co-evolve as a correlated functional unit.

  3. By using a more efficient search we are able to use a denser graph with more
    nodes (stems) and edges (relations between stems). Thus, we are able to remove
    the necessity of previous algorithms to sparsify the graph, at the expense of
    sensitivity, by imposing assumptions and rigid limitations.

  4. The use of hyper heuristics enables the mining algorithm to find the exact
    conditions (i.e., biological indicators) regardingwhento apply each heuristic,
    or combinations.

  5. We push the limit of what has been done with evolution further mining
    RNA sequence-structure motifs one of the most difficult domains to which
    evolutionary algorithms have been applied to date.

  6. Along the way we devise several novel heuristics for this domain. The methodol-
    ogy of creating these heuristics could be applied to other biological domains and
    other.

  7. By devising novel heuristics and evolving them into hyper-heuristics, we con-
    tinue our previous presentation of a new framework for solving many (non-
    admissible) heuristic problems, which proved to be efficient and successful.

  8. We transform a non-search domain into a state graph, and thus strengthen the
    bridge between the learning and search community to the biological world.
    The rest of the paper is organized as follows: In the next section we examine
    previous and related work. In Sect. 3 we describe our method. Finally, we end with
    concluding remarks and future work in Sect. 4.


2 Previous Work


2.1 Mining Common Structure Among a Set of Unaligned


RNA Sequences


Several approaches exist for identifying common structure among a given set
of RNA molecules. The first approach (Pederson et al. 2006 ; Hofacker et al.
2002 ; Washietl and Hofacker 2004 ) is to align sequences using standard multiple
sequence alignment tools (e.g., ClustalW, Thompson et al. 1994 ), then use structure-
conserving mutations for the inference of a consensus structure. However, in order
for this approach to work the multiple sequence-alignment step needs to rely on

Free download pdf