Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
protein O-GlcNAcylation sites with respect to their datasets, fea-
ture extraction methods, and classifier algorithms. We also discuss
the future challenges and outstanding questions.

2 Material and Methods


Computational prediction of protein O-GlcNAcylation sites can be
formulated as a two-class classification problem. The systematic
flowchart of the prediction method is summarized in Fig.2. The
method consists mainly of the following components: dataset con-
struction and preprocessing, sequence feature representation and
selection, and prediction algorithms. We discuss the six computa-
tional predictors for O-GlcNAcylation sites identification from the
above three aspects that were provided by these studies [6–11].

2.1 Datasets
Construction and
Preprocessing


It is crucial to construct a high-quality benchmark dataset for
unbiased performance evaluation. The datasets used to predict
protein O-GlcNAcylation sites are generally constructed from the
UniProtKB/Swiss-Prot Database [12], dbPTM [13], dbOGAP
[6], O-GlycBase [14], PhosphoSitePlus [13], and the PubMed
literature. For O-GlcNAcylation sites prediction, experimentally
verified O-GlcNAcylation sites are defined as the positive dataset.

Fig. 2The systematic flowchart of the O-GlcNAcylation prediction method

Computational Prediction of Protein O-GlcNAc Modification 237
Free download pdf