8.5 Proteomics and protein function
In order to completely understand how a cell works, it is necessary to understand the
function (role) of every single protein in that cell. The analysis of any specific disease
(e.g. cancer) will also require us to understand what changes have taken place in the
protein component of the cell, so that we can use this information to understand the
molecular basis of the disease, and thus design appropriate drug therapies and develop
diagnostic methods. (Just about every therapeutic drug that is currently in use has
a protein as its target.) The completion of the Human Genome Project might suggest
that it is not now necessary to study proteins directly, since the amino acid sequence
of each protein can be deduced from the DNA sequence. This is not true for the
following reasons:
- First, although the DNA in each cell type in the body is the same, different sets of
genes are expressed in different tissues, and hence the protein component of a cell
varies from cell type to cell type. For example, some proteins are found in nearly all
cells (the so-called house-keeping genes) such as those involved in glycolysis, whereas
specific cell types such as kidney, liver, brain, etc. contain specific proteins unique
to that tissue and necessary for the functioning of that particular tissue/organ. It is
therefore only by studying the protein component of a cell directly that we can
identify which proteins are actually present.
- Secondly, it is now appreciated that a single DNA sequence (gene) can encode multiple
proteins. This can occur in a number of ways:
(i) Alternative splicing of the mRNA transcript.
(ii) Variation in the translation ‘stop’ or ‘start’ sites.
(iii) Frameshifting, where a different set of triplet codons is translated, to give a
totally different amino acid sequence.
(iv) Post-translational modifications. The genome sequence defines the amino
acid sequence of a protein, but tells us nothing of any post-translational
modifications (Sections 8.2.1 and 9.5.5) that can occur once the polypeptide
chain is synthesised at the ribosome. Up to 10 different forms (variants)
of a single polypeptide chain can be produced by phosphorylation,
glycosylation, etc.
The consequence of the above is that the total protein content of the human body is an
order of magnitude more complex than the genome. The human genome sequence
suggests there may be 30 000–40000 genes (and hence proteins) whereas estimates
of the actual number of proteins in human cells suggests possibly as many as 200000
or even more. The dogma that one gene codes for one protein has been truly
demolished!
From the above, I hope it is easy to appreciate the need to directly analyse the protein
component of the cell, and the need for an understanding of the function of each
individual protein in the cell. In recent years, development of new techniques (discussed
below) has enhanced our ability to study the protein component of the cell and has led
to the introduction of the terms proteome and proteomics. The total DNA composition
340 Protein structure, purification, characterisation and function analysis