9780521516358book.pdf

(lily) #1
done using programs such as BLAST and PSI-BLAST. If sequence homology is found
with a protein of known function, either from the same or different species, then this
invariably identifies the function of the protein. However, this approach does not
always work. For example, when the genome of the yeastSaccharomyces cerevisiae
was completely sequenced in 1996, 6000 genes were identified. Of these, approxi-
mately 2000 coded for proteins that were already known to exist in yeast (i.e. had been
purified and studied in previous years), 2000 had homology with known sequences
and hence their function could be deduced by the homology method but 2000 could
not be matched to any known genes, i.e. they were ‘new’, previously undiscovered
genes. In these cases, there are a number of other computational methods that can be
used to help to identify the protein’s function. These include:


  • Phylogenic profile method:This method aims to identify any other protein(s) that
    has the same phylogenic profile (i.e. the same pattern of presence or absence) as the
    unknown protein, in all known genomes. If such proteins are found it is inferred that
    the unknown protein is involved in the same cellular process as these other protein(s)
    (i.e. they are said to have a functional link) and will give a strong clue as to the
    function of the unknown protein. This method is based on the premise that two
    proteins would not always both be inherited into a new species (or neither inherited)
    unless the two proteins have a functional link. At the time of writing there are over
    100 published genome sequences that can be surveyed with this method. Fig. 8.8
    shows a simple, hypothetical example, where just five genomes are analysed.

  • Method of correlated gene neighbours:If two genes are found to be neighbours in
    several different genomes, a functional linkage may be inferred between the two
    proteins. The central assumption of this approach is based on the observation that
    functionally related genes in prokaryotes tend to be linked to form operons (e.g. the
    lacoperon). Although operons are rare in eukaryotic species, it does appear that
    proteins involved in the same biological process/pathway within the cell have their
    genes situated in close proximity (e.g. within 500 bp) in the genome. Thus, if two
    genes are found to be in close proximity across a number of genomes, it can be
    inferred that the protein products of these genes have a functional linkage. This
    method is most robust for microbial genomics but works to some extent in human
    cells where operon-like clusters are also observed. As an example, this method
    correctly identified a functional link between eight enzymes in the biosynthetic
    pathway for the amino acid arginine inMycobacterium tuberculosis.

  • Analysis of fusion:This method is based on the observation that two genes may exist
    separately in one organism, whereas the genes are fused into a single multifunctional
    gene in another organism. The existence of the protein product of the fused gene,
    in which the two functions of the protein clearly interact (being part of the same
    protein molecule), suggests that in the first organism the two separate proteins also
    interact. It has been suggested that gene fusion events occur to reduce the regulational
    load of multiple interacting gene products.

  • Protein–protein interactions:A further clue to identifying protein function can
    come from identifying protein–protein interactions, and methods to identify these are
    described in the next section.


346 Protein structure, purification, characterisation and function analysis

Free download pdf