untitled

(ff) #1

50 2 XML Semantics


cctggacctc ctgtgcaaga acatgaaaca nctgtggttc ttccttctcc tggtggcagc 60
tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg ggcccaggac tggggaagcc 120
tccagagctc aaaaccccac ttggtgacac aactcacaca tgcccacggt gcccagagcc 180
caaatcttgt gacacacctc ccccgtgccc acggtgccca gagcccaaat cttgtgacac 240
acctccccca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc 300
nnngtgccca gcacctgaac tcttgggagg accgtcagtc ttcctcttcc ccccaaaacc 360
caaggatacc cttatgattt cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag 420
ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac ggcgtggagg tgcataatgc 480
caagacaaag ctgcgggagg agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac 540
cgtcctgcac caggactggc tgaacggcaa ggagtacaag tgcaaggtct ccaacaaagc 600
cctcccagcc cccatcgaga aaaccatctc caaagccaaa ggacagcccn nnnnnnnnnn 660
nnnnnnnnnn nnnnnnnnnn nnnnngagga gatgaccaag aaccaagtca gcctgacctg 720
cctggtcaaa ggcttctacc ccagcgacat cgccgtggag tgggagagca atgggcagcc 780
ggagaacaac tacaacacca cgcctcccat gctggactcc gacggctcct tcttcctcta 840
cagcaagctc accgtggaca agagcaggtg gcagcagggg aacatcttct catgctccgt 900
gatgcatgag gctctgcaca accgctacac gcagaagagc ctctccctgt ctccgggtaa 960
atgagtgcca tggccggcaa gcccccgctc cccgggctct cggggtcgcg cgaggatgct 1020
tggcacgtac cccgtgtaca tacttcccag gcacccagca tggaaataaa gcacccagcg 1080
ctgccctgg 1089

The sequence is divided into groups of 60 bases, and these groups are
divided into subgroups of 10 bases. A number follows each group of 60
bases. The letternis used when a base is not known.


  1. Define a datatype for an amino acid sequence (protein). Here is an exam-
    ple of such a sequence:


1 meepqsdpsv epplsqetfs dlwkllpenn vlsplpsqam ddlmlspddi eqwftedpgp
61 deaprmpeaa ppvapapaap tpaapapaps wplsssvpsq ktyqgsygfr lgflhsgtak
121 svtctyspal nkmfcqlakt cpvqlwvdst pppgtrvram aiykqsqhmt evvrrcphhe
181 rcsdsdglap pqhlirvegn lrveylddrn tfrhsvvvpy eppevgsdct tihynymcns
241 scmggmnrrp iltiitleds sgnllgrnsf evrvcacpgr drrteeenlr kkgephhelp
301 pgstkralpn ntssspqpkk kpldgeyftl qirgrerfem frelnealel kdaqagkepg
361 gsrahsshlk skkgqstsrh kklmfktegp dsd

Like DNA sequences, it is divided into groups of 60 amino acids, and
these groups are divided into subgroups of 10 amino acids. A number
precedes each group. The letterxis used for an unknown amino acid.
The lettersj,o,anduare not used for amino acids.
Free download pdf