untitled

promoters, transcription initiation sites, and “stop”
codons, to deduce the positions of protein-encoding
genes. In practice, much of the generation of genome
sequences relies heavily on computer programs.
Most genome sequencing projects involve the whole
genome shotgunapproach, in which short sequences
are generated randomly rather than systematically
and are then assembled into contigs. Typically, every
part of the sequence is covered at least five times and
often ten times (by overlapping of the sequenced
regions). The result is a “high quality draft” of the whole
genome, but often about 2% of the genome cannot be
mapped accurately – where there are frequent repeat
sequences and DNA regions of high G+C (Guanine +
Cytosine) content.

What do we gain from whole genome
sequences?

Many benefits accrue from comparing the genomes of
different fungi, or from comparing fungal genomes with
those of other organisms. To give just a few examples:

The nucleotide sequences of genes can be used to
predict the protein sequences, so automated searches
such as BLAST (basic local alignment search tool) can
identify homologous genes in different organisms (and
any evolutionary changes in those genes over different
periods of time).

Nucleotide sequences that do not appear in current
databases may represent genes with undiscovered
functions – the basis of “gene mining” for potential
new proteins of commercial interest.

Many aspects of human, animal and plant disease are
still unresolved, so the elucidation of genes control-
ling these processes could provide new directions for
tackling these problems.

Above all, as more and more gene functions are dis-
covered they add to the sum of knowledge, and since
most of this information is freely available the discovery
of a newly characterized gene in one organism can
help to “fill the gaps” in the genomes of other organ-
isms. The sequencing of the first eukaryotic genome
(Saccharomyces cerevisiae) released in 1996, showed
that about half of the open reading frames (ORFs) had
no clear homologs in published databases. This pattern
has been repeated time and again in the genomes
sequenced since that time.
Genomics is “Big Science.” The sequencing of
S. cerevisiaeinvolved 90 research laboratories, and the
paper describing the genome sequence of Neurospora
crassahad 77 authors. Given the scale of these projects,
and the thousands of fungal genomes that could
potentially be sequenced, it is important to prioritize

and coordinate sequencing efforts. As one example, the fungal research community of the USA has, since the year 2000, undertaken broad consultation and published a series of “White Papers” on the Fungal Genome Ini- tiative. Of 15 candidate fungi initially proposed, seven are currently being sequenced. In 2003 the second White Paper included a list of 44 additional fungi with emphasis on clusters of related species to promote com- parative genome analysis: [http://www.broad.mit.edu/ annotation/fungi/fgi] In a field that is moving so rapidly, and where fund- ing decisions have still to be made, the FGI website (address above) is the most reliable source of information. But Table 9.4 gives brief details of the original 15 submissions to illustrate the rationale behind such sequencing efforts. It included representatives of all the major fungal phyla (Chytridiomycota, Zygomycota, Ascomycota, and Basidiomycota) and organisms in three categories – those of medical significance, those of commercial significance, and those that would contribute to understanding of evolution and fungal diversity.

Significant findings from the Neurospora crassagenome sequence

The high quality draft sequence of the N. crassa genome was completed in 2003 (Galagan et al.2003) and represents a milestone – the culmination of more than 60 years of research on one of the most genetic- ally well characterized fungi. The sequence still needs further detailed work, to check potential discrepancies and to join the existing contigs, but already it has revealed new information, including the identification of genes potentially associated with light signalling and secondary metabolism. The main features of the sequenced N. crassa genome are shown in Table 9.5 (from Galagan et al. 2003). Among the more notable points is the predicted presence of over 10,000 protein-encoding genes, most of which code for proteins of more than 100 amino acids. But 41% of the Neurosporaproteins lack significant matches to any of the known proteins in public databases, and 57% of Neurosporaproteins lack significant matches to genes in either Saccharomyces cerevisiaeor Schizosaccharomyces pombe. Another interesting feature of Neurosporais that it has the widest range of genome defense mechanisms known for any eukayotic organism. One of these is a process apparently unique to fungi, termed repeat- induced point mutation(RIP). RIP was discovered in Neurosporaseveral years ago, as a process that effectively prevents genome evolution. The duplication of genes is widely recognized to be responsible for evolutionary development, because the

FUNGAL GENETICS, MOLECULAR GENETICS, AND GENOMICS 177

untitled

FUNGAL GENETICS, MOLECULAR GENETICS, AND GENOMICS 177

Get our desktop app

Company

Features

Documentation

Resources