3.1.5 Application in
Document-Focused
Analysis
Machine learning taggers are greatly needed for clinical concept
extraction from medical documents. For instance, BioTagger-GM
is developed based on machine learning taggers for the automated
detection of gene/protein names in the biological domain, which
can be further improved by training on the datasets from multiple
data sources [76]; and to extract clinical entities (e.g., medical
problems, tests, and treatments, as well as their asserted status)
from hospital discharge summaries, a hybrid clinical entity extrac-
tion system has been implemented for mining clinical text by
conditional random fields [77]; and to collate coreferent chains of
concepts from a corpus of clinical documents, a machine learning
approach based on graphical models was employed to recognize
and cluster coreferent concepts, which will be applicable for the
assembly of problem and medication lists from clinical documents
[78]. More advanced, the reported molecular mechanism can also
be detected and collected from the clinical document mining, such
as to search and identify gene-associated clinical trials by potential
participants and clinicians, the automated methods are keys to
extract genetic information from narrative trial documents, e.g., a
two-stage machine learning-based approach as information
retrieval tools targeting gene-associated clinical trials has been
applied to identify genes and genetic lesion statuses in clinical trial
documents held in the cancer clinical trial database [79]; and to
identify drug side effects from free text being key for the develop-
ment of up-to-date knowledge sources on drug adverse reactions,
an intelligent system combining machine learning, rule- and
knowledge-based approaches have been used to identify such
drug side effects from the literatures [80].3.2 New
Development of
Machine Learning in
Omics Data Analysis
Along with the development of high-throughput technologies in
different fields of biology [81, 82], the machine learning also enters
a new development stage to address the new data and new problem
[83–86], rather than the conventional methodology enhancement
[87, 88].3.2.1 Data Mining in
Omics Data Analysis
In line with the application of high-throughput approach, firstly the
sequencer like Illumina Genome Analyzer which can generate
millions of short reads, many pre-procession packages (e.g., Ibis:
Improved base identification system) have provided efficient base
caller to increase the usable reads by reducing the error rate
[89]. Next, the precise genome annotations are necessary to
achieve the accurate definition of genomic segments, so that the
genome annotation (e.g., to recognize exons and introns on the
unspliced mRNA) is expected to be improved by using modern
machine learning technology (e.g., support vector machines and
label sequence learning) [90]; and to gain a detailed understanding
of the dynamically composed transcription unit structures, the
strand-specific RNA-seq datasets are collected to derive theRevisit of Machine Learning Supported Biological and Biomedical Studies 193