100 5 Survey of Ontologies in Bioinformatics
BSML http://www.bsml.org
The Bioinformatic Sequence Markup Language (BSML) is a language that
encodes biological sequence information, which encompasses graphical rep-
resentations of biologically meaningful objects such as nucleotide or protein
sequences. The current version (released in 2002) is BSML v3.1. BSML takes
advantage of XML features for encoding hierarchically organized informa-
tion to provide a representation of knowledge about biological sequences.
BSML is useful in capturing the semantics of biological objects (e.g., com-
plete genome, chromosome, regulatory region, gene, transcript, gene prod-
uct, etc.). BSML can be rendered in the Genomic XML viewer, which greatly
facilitates communications among biologists, since biologists are accustomed
to visualizing biological objects and to communicating graphically about the
these objects and their annotations.
The root element for a BSML document is tagged withBsml.Conse-
quently, a BSML document should look like the following:
<?xml version= "1.0"?>
<!DOCTYPE Bsml PUBLIC
"http://www.labbook.com/dtd/bsml2_2.dtd">
<Bsml>
...
</Bsml>
BSML is primarily concerned with DNA, RNA, and protein. Information in
a BSML document belongs primarily to one of two broad categories: “se-
quence data” and “sequence annotation.”
- Sequence data.The primary sequence data of the molecule of interest are
contained within the sequence element; the information of the sequence
is represented using attributes and their associated values, defined in the
BSML DTD. figure 5.5 shows an example of using BSML to represent the
amino acid sequence of human tumor suppressor p53. - Sequence annotation.Sequence annotation refers to information for a par-
ticular sequence that is beyond the sequence data themselves. Annota-
tions have different types, which include positional annotation, qualita-
tive annotation, quantitative annotation, and referential annotation.