done using programs such as BLAST and PSI-BLAST. If sequence homology is found
with a protein of known function, either from the same or different species, then this
invariably identifies the function of the protein. However, this approach does not
always work. For example, when the genome of the yeastSaccharomyces cerevisiae
was completely sequenced in 1996, 6000 genes were identified. Of these, approxi-
mately 2000 coded for proteins that were already known to exist in yeast (i.e. had been
purified and studied in previous years), 2000 had homology with known sequences
and hence their function could be deduced by the homology method but 2000 could
not be matched to any known genes, i.e. they were ‘new’, previously undiscovered
genes. In these cases, there are a number of other computational methods that can be
used to help to identify the protein’s function. These include:
- Phylogenic profile method:This method aims to identify any other protein(s) that
has the same phylogenic profile (i.e. the same pattern of presence or absence) as the
unknown protein, in all known genomes. If such proteins are found it is inferred that
the unknown protein is involved in the same cellular process as these other protein(s)
(i.e. they are said to have a functional link) and will give a strong clue as to the
function of the unknown protein. This method is based on the premise that two
proteins would not always both be inherited into a new species (or neither inherited)
unless the two proteins have a functional link. At the time of writing there are over
100 published genome sequences that can be surveyed with this method. Fig. 8.8
shows a simple, hypothetical example, where just five genomes are analysed. - Method of correlated gene neighbours:If two genes are found to be neighbours in
several different genomes, a functional linkage may be inferred between the two
proteins. The central assumption of this approach is based on the observation that
functionally related genes in prokaryotes tend to be linked to form operons (e.g. the
lacoperon). Although operons are rare in eukaryotic species, it does appear that
proteins involved in the same biological process/pathway within the cell have their
genes situated in close proximity (e.g. within 500 bp) in the genome. Thus, if two
genes are found to be in close proximity across a number of genomes, it can be
inferred that the protein products of these genes have a functional linkage. This
method is most robust for microbial genomics but works to some extent in human
cells where operon-like clusters are also observed. As an example, this method
correctly identified a functional link between eight enzymes in the biosynthetic
pathway for the amino acid arginine inMycobacterium tuberculosis. - Analysis of fusion:This method is based on the observation that two genes may exist
separately in one organism, whereas the genes are fused into a single multifunctional
gene in another organism. The existence of the protein product of the fused gene,
in which the two functions of the protein clearly interact (being part of the same
protein molecule), suggests that in the first organism the two separate proteins also
interact. It has been suggested that gene fusion events occur to reduce the regulational
load of multiple interacting gene products. - Protein–protein interactions:A further clue to identifying protein function can
come from identifying protein–protein interactions, and methods to identify these are
described in the next section.
346 Protein structure, purification, characterisation and function analysis