Environmental Microbiology of Aquatic and Waste Systems

(Martin Jones) #1

3.5 The Open Reading Frame and the Identification of Genes 43


Once a gene has been sequenced, it is important to
determine the correct open reading frame. Every region
of DNA has six possible reading frames, three in each
direction because a codon consists of three nucleotides.
The reading frame that is used determines which amino
acids will be encoded by a gene. Typically, only one
reading frame is used in translating a gene (in eukary-
otes), and this is often the longest open reading frame.
Once the open reading frame is known, the DNA
sequence can be translated into its corresponding
amino acid sequence.
For example, the sequence of DNA in Fig. 3.8 can
be read in six reading frames: three in the forward and
three in the reverse direction. The three reading frames
in the forward direction are shown with the translated
amino acids below each DNA sequence. Frame 1 starts
with the “a,” Frame 2 with the “t,” and Frame 3 with
the “g.” Stop codons are indicated by an “*” in the
protein sequence. The longest ORF is in Frame 1.
Genes can be identified in a number of ways, which
are discussed below:



  1. Using computer programs
    As was shown above, the open reading frame (ORF)
    is deduced from the start and stop codons. In prokary-
    otic cells which do not have many extrons (interven-


ing non-coding regions of the chromosome), the
ORF will in most cases indicate a gene. However, it
is tedious to manually determine ORF and many
computer programs now exist which will scan the
base sequences of a genome and identify putative
genes. Some of the programs are given in Table 3 .2.
In scanning a genome or DNA sequence for genes
(i.e., in searching for functional ORFs), the follow-
ing are taken into account in the computer
programs:
(a) Usually, functional ORFs are fairly long and do
not usually contain less than 100 amino acids
(i.e., 300 codons).
(b) If the types of codons found in the ORF being
studied are also found in known functional
ORFs, then the ORF being studied is likely to
be functional.
(c) The ORF is also likely to be functional if its
sequences are similar to functional sequences
in genomes of other organisms.
(d) In prokaryotes, the ribosomal translation does
not start at the first possible (earliest 5¢) codon.
Instead, it starts at the codon immediately
downstream of the Shine–Dalgardo binding
site sequences. The Shine–Dalgardo sequence

5' 3'
atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa
1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta
taa
M P K L N S V E G F S S F E D D V
*
2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat
C P S * I A * R G F H H L R T M Y
3 gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt

Fig. 3.8 Sequence from a hypothetical DNA fragment (From Cooper ( 2008 ), on behalf of Board of Regents, University of
Wisconsin; http://bioweb.uwlax.edu/genweb/molecular/seq_anal/translation/translation.html. Reproduced with permission)


Table 3.2 Some Internet tools for the gene discovery in DNA sequence bases (Modified from Fickett 1996 )


Category Services Organism(s) Web address
Database search BLAST; search sequence bases Any [email protected]
FASTA; search sequence bases Any [email protected]
BLOCKS; search for functional motifs Any [email protected]
Profilescan Any http://ulrec3.unil.ch.
MotifFinder Any [email protected]
Gene identification FGENEH; integrated gene identification Human [email protected]
GeneID; integrated gene identification Vetebrate [email protected]
GRAIL; integrated gene identification Human [email protected]
EcoParse; integrated gene identification Escherichia coli

Free download pdf