proteins and the prediction of their topologies because most mem-
brane proteins have poorly structure information in the available
databases [34]; and a SVM classifier has been used to investigate the
functional commonality and sequence homology of helical antimi-
crobial peptides, which can detect membrane activity in peptide
sequences and penetrate the membranes of microbe [6]. Three is to
select the functional key residues on sequences: to identify glyco-
sylation sites requiring expensive/laborious experimental research,
a bioinformatic tool called GlycoMine based on random forest
algorithm is used for the systematic in silico identification of three
types of glycosylation sites in the human proteome [52]; and to
predict catalytic residues from 3-D structures, the partial order
optimum likelihood (POOL) has used machine learning strategies
to combine electrostatic and geometric information for enhancing
the site prediction when sequence homologues are available, which
is especially applicable to proteins with novel folds and engineered
proteins [37]; and to identify residues that interact with ligands for
designing small molecules interactive with target protein, a
sequence-based method, called LIBRUS, combines homology-
based transfer and direct prediction by support vector machine
[53]; and similarly, the DISLOCATE, a two-step method based
on machine learning models, is developed for predicting both the
bonding state and the connectivity patterns of cysteine residues in a
protein chain, which especially improves the overall performance
when the features as protein subcellular localization are
included [54].
3.1.2 Application in
Image-Focused Analysis
Image analysis is an essential component in many biological experi-
ments with multiple scales. For example, firstly on the molecule
level, the StarryNite performs the automatic recognition of fluores-
cently labeled cells and traces their lineage, where a SVM classifier is
assistant to decide whether StarryNite is correct or not to reduce
the time required on correcting errors [55]; and two-dimensional
gel electrophoresis (2-DE) is the protein separation method used in
expression proteomics where 2-DE gel image analysis still remains a
serious bottleneck, so that a hierarchical machine learning-based
segmentation methodology has been proposed to improve the
sensitivity and precision simultaneously [56]. Secondly on the
molecular network level, the automated image analysis is able to
effectively score many phenotypes, and a supervised machine
learning approach can be used to iterative feedback to readily
score phenotypes in high-throughput image-based screens rather
than traditional screen by subjective visual inspection, which can
speed up the discovery of biological pathways [7]. Thirdly on the
cell level, previous live-cell imaging studies suggested that clathrin-
mediated endocytosis (CME) is inefficient during cells internalizing
molecules; a genome editing and machine learning method is
190 Xiang-tian Yu et al.