promoters, transcription initiation sites, and “stop”
codons, to deduce the positions of protein-encoding
genes. In practice, much of the generation of genome
sequences relies heavily on computer programs.
Most genome sequencing projects involve the whole
genome shotgunapproach, in which short sequences
are generated randomly rather than systematically
and are then assembled into contigs. Typically, every
part of the sequence is covered at least five times and
often ten times (by overlapping of the sequenced
regions). The result is a “high quality draft” of the whole
genome, but often about 2% of the genome cannot be
mapped accurately – where there are frequent repeat
sequences and DNA regions of high G+C (Guanine +
Cytosine) content.
What do we gain from whole genome
sequences?
Many benefits accrue from comparing the genomes of
different fungi, or from comparing fungal genomes with
those of other organisms. To give just a few examples:
- The nucleotide sequences of genes can be used to
predict the protein sequences, so automated searches
such as BLAST (basic local alignment search tool) can
identify homologous genes in different organisms (and
any evolutionary changes in those genes over different
periods of time). - Nucleotide sequences that do not appear in current
databases may represent genes with undiscovered
functions – the basis of “gene mining” for potential
new proteins of commercial interest. - Many aspects of human, animal and plant disease are
still unresolved, so the elucidation of genes control-
ling these processes could provide new directions for
tackling these problems.
Above all, as more and more gene functions are dis-
covered they add to the sum of knowledge, and since
most of this information is freely available the discovery
of a newly characterized gene in one organism can
help to “fill the gaps” in the genomes of other organ-
isms. The sequencing of the first eukaryotic genome
(Saccharomyces cerevisiae) released in 1996, showed
that about half of the open reading frames (ORFs) had
no clear homologs in published databases. This pattern
has been repeated time and again in the genomes
sequenced since that time.
Genomics is “Big Science.” The sequencing of
S. cerevisiaeinvolved 90 research laboratories, and the
paper describing the genome sequence of Neurospora
crassahad 77 authors. Given the scale of these projects,
and the thousands of fungal genomes that could
potentially be sequenced, it is important to prioritize
and coordinate sequencing efforts. As one example, the
fungal research community of the USA has, since the
year 2000, undertaken broad consultation and published
a series of “White Papers” on the Fungal Genome Ini-
tiative. Of 15 candidate fungi initially proposed, seven
are currently being sequenced. In 2003 the second
White Paper included a list of 44 additional fungi with
emphasis on clusters of related species to promote com-
parative genome analysis: [http://www.broad.mit.edu/
annotation/fungi/fgi]
In a field that is moving so rapidly, and where fund-
ing decisions have still to be made, the FGI website
(address above) is the most reliable source of informa-
tion. But Table 9.4 gives brief details of the original
15 submissions to illustrate the rationale behind such
sequencing efforts. It included representatives of all the
major fungal phyla (Chytridiomycota, Zygomycota,
Ascomycota, and Basidiomycota) and organisms in
three categories – those of medical significance, those
of commercial significance, and those that would
contribute to understanding of evolution and fungal
diversity.
Significant findings from the Neurospora
crassagenome sequence
The high quality draft sequence of the N. crassa
genome was completed in 2003 (Galagan et al.2003)
and represents a milestone – the culmination of more
than 60 years of research on one of the most genetic-
ally well characterized fungi. The sequence still needs
further detailed work, to check potential discrepancies
and to join the existing contigs, but already it has
revealed new information, including the identification
of genes potentially associated with light signalling and
secondary metabolism.
The main features of the sequenced N. crassa
genome are shown in Table 9.5 (from Galagan et al.
2003). Among the more notable points is the predicted
presence of over 10,000 protein-encoding genes, most
of which code for proteins of more than 100 amino
acids. But 41% of the Neurosporaproteins lack signi-
ficant matches to any of the known proteins in
public databases, and 57% of Neurosporaproteins lack
significant matches to genes in either Saccharomyces
cerevisiaeor Schizosaccharomyces pombe.
Another interesting feature of Neurosporais that it
has the widest range of genome defense mechanisms
known for any eukayotic organism. One of these is a
process apparently unique to fungi, termed repeat-
induced point mutation(RIP).
RIP was discovered in Neurosporaseveral years ago,
as a process that effectively prevents genome evolution.
The duplication of genes is widely recognized to be
responsible for evolutionary development, because the