COMPUTATIONAL MODELING AND SIMULATION AS ENABLERS FOR BIOLOGICAL DISCOVERY 135
to that component, and (3) creating classification models by comparing measurements of structures
known to contain the motif to measurements of structures known not to contain the motif. In this
case, the conserved component chosen was the recognition helix (i.e., the alpha helix that makes
sequence-specific contact with DNA), and two types of relevant measurements were the hydropho-
bic area of interaction between secondary structure elements (SSEs) and the relative solvent acces-
sibility of SSEs.
With a classification model created, the entire Protein Data Bank of experimentally measured struc-
tures was searched and new examples of the motif were found that have no detected sequence homol-
ogy with previously known examples. Two such examples are Esa1 histone acetyltransferase and
isoflavone 4-O-methyltransferase. The result emphasizes an important utility of the approach: sequence-
based methods used to discern a functional class of proteins may be supplemented through the use of a
classification model based on three-dimensional structural information.
FIGURE 5.1 The ZDOCK/RDOCK prediction for dockerin (in red) superposed on the crystal structure for CAPRI
Target 13, cohesin/dockerin. SOURCE: Courtesy of Brian Pierce and Zhiping Weng, Boston University.