Computational Systems Biology Methods and Protocols.7z

protein O-GlcNAcylation sites with respect to their datasets, feature extraction methods, and classifier algorithms. We also discuss the future challenges and outstanding questions.

2 Material and Methods

Computational prediction of protein O-GlcNAcylation sites can be formulated as a two-class classification problem. The systematic flowchart of the prediction method is summarized in Fig.2. The method consists mainly of the following components: dataset construction and preprocessing, sequence feature representation and selection, and prediction algorithms. We discuss the six computational predictors for O-GlcNAcylation sites identification from the above three aspects that were provided by these studies [6–11].

2.1 Datasets
Construction and
Preprocessing

It is crucial to construct a high-quality benchmark dataset for unbiased performance evaluation. The datasets used to predict protein O-GlcNAcylation sites are generally constructed from the UniProtKB/Swiss-Prot Database [12], dbPTM [13], dbOGAP [6], O-GlycBase [14], PhosphoSitePlus [13], and the PubMed literature. For O-GlcNAcylation sites prediction, experimentally verified O-GlcNAcylation sites are defined as the positive dataset.

Fig. 2The systematic flowchart of the O-GlcNAcylation prediction method

Computational Prediction of Protein O-GlcNAc Modification 237

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources