to be captured within a very short timescale. More recently alternative methods of
analysis including high performance liquid chromatography based approaches have
gained in popularity, especially for DNA mutation analysis. Mass spectrometry is also
becoming increasingly used for nucleic acid analysis.
5.8 Molecular biology and bioinformatics
5.8.1 Basic bioinformatics
Bioinformatics is now an established and vital resource for molecular biology
research and is also a mainstay of routine analysis of DNA. This increase in use of
bioinformatics has been driven by the increase in genetic sequence information and
the need to store, analyse and manipulate the data. There are now a huge number of
sequences stored ingenetic databasesfrom a variety of organisms, including the
human genome. Indeed the genetic information from various organisms is now an
indispensable starting point for molecular biology research. The main primary data-
bases include GenBank at the National Institutes of Health (NIH) in the USA, EMBL at
the European Bioinformatics Institute (EBI) at Cambridge, UK and the DNA Database
of Japan (DDBJ) at Mishima in Japan. These databases contain the nucleotide
sequences which are annotated to allow easy identification. There are also many other
databases such as secondary databases that contain information relating to sequence
motifs, such as core sequences found in cytochrome P450 domains, or DNA-binding
domains. Importantly all of the databases may be freely accessed over the internet.
A number of these important databases and internet resources are listed in Table 5.4.
Consequently the new expanding and exciting areas of bioscience research are those
that analyse genome and cDNA sequence databases (genomics) and also their protein
counterparts (proteomics). This is sometimes referred to asin silicoresearch.
5.8.2 Analysing information using bioinformatics
One of the most useful bioinformatics resources is termed BLAST (Basic Local Align-
ment Search Tool) located at the NCBI (www.ncbi.nlm.nih.gov). This allows a DNA
sequence to be submitted via the internet in order to compare it to all the sequences
contained within a DNA database. This is very useful since it is possible once a
nucleotide sequence has been deduced by, for example, Sanger sequencing, to identify
sequences of similarity. Indeed if human sequences are used and have already been
mapped it is possible to locate their position to a particular chromosome using NCBI
Map Viewer. Further resources such as ORF (open reading frame) finder allow a
search to be undertaken for open reading frames, e.g. sequences beginning with a
start codon (ATG) and continuing with a significant number of ‘coding’ triplets before
a stop codon is reached. There are a number of other sequences that may be used to
define coding sequences; these include ribosome binding sites, splice site junctions,
poly(A) polymerase sequences and promoter sequences that lie outside the coding
170 Molecular biology, bioinformatics and basic techniques