Genetic_Programming_Theory_and_Practice_XIII

(C. Jardin) #1

Learning Heuristics for Mining RNA Sequence-Structure Motifs 31


3.3.1 Stem Edge Features


As described in Sect.2.1,Jietal.( 2004 ) defined a fixed equation for describing
similarity between two stems, which was a combination of five features. We use
these features, as well as the equation, as part of our stem edge features (SEF),
along with additional features described below.
Some of our features are added twice: once as is (i.e.,fl.ix;jy/), and once divided


by the energy, using the formulafNl.ix;jy/D 2 C^2 rxf.l.ii/xC;jyr/y.j/(as described by Ji et al.).


The complete list of our features is described in Table 2.

Ta b l e 2 The list of features
Feature Origin Description
f 1 :Helix length [1] Number of base-pairs in the
stem
f 2 :f 2 DNf 1 [1] [*]
f 3 :Helix sequence [1] The sequence of bases in the
stem
f 4 :f 4 DNf 3 [1] [*]
f 5 :Loop sequence [1] The sequence of letters
between the innermost
base-pair in the stem
f 6 :f 6 DNf 5 [1] [*]
f 7 :Stem stability [1] The free energy value of the
stem
f 8 :f 8 DNf 7 [1] [*]
f 9 :Relative positions [1] The position of the left base
in the outermost base-pair in
the stem (relative to the
sequence)
f 10 :f 10 DNf 9 [1] [*]
f 11 :Ji. et al. Similarity [1]^2 

P
lD1;3;5;7fwlfl.ix;jy/g
2 Crx.i/Cry.j/
f 12 :Context [3] The shift between both helix
counterparts of a stem in the
anchor or non-anchor region
f 13 :StemSearch similarity [2] Similarity score used by
StemSearch, determined by
structural and sequential
similarity
Œ1—Features taken from Ji et al. ( 2004 )
Œ2—Features taken from Milo et al. ( 2014 )
Œ3—Features designed by us
Œ*—fNl.ix;jy/D 2 C^2 rfxl..i/ixC;jyr/y.j/
Free download pdf