Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
COMPUTATIONAL TOOLS 95

recombination, and smaller numbers of relatively recent mutations.^107 Examining the variation of gene
or protein sequences between different species helps to draw a picture of the pedigree of a particular
gene or protein over evolutionary time, but scientists are also interested in understanding the practical
significance of such variation within a single species.
Geneticists have been trying for decades to identify the genetic variation among individuals in the
human species that result in physical differences between them. There is an increasing recognition of
the importance of genetic variation for medicine and developmental biology and for understanding the
early demographic history of humans.^108 In particular, variation in the human genome sequence is
believed to play a powerful role in the origins of and prognoses for common medical conditions.^109
The total number of unique mutations that might exist collectively in the entire human population
is not known definitively and has been estimated at upward of 10 million,^110 which in a 3 billion base-
pair genome corresponds to a variant every 300 bases or less. Included in these are single-nucleotide
polymorphisms (SNPs), that is, single-nucleotide sites in the genome where two or more of the four
bases (A, C, T, G) occur in at least 1 percent of the population. Many SNPs were discovered in the
process of overlapping the ends of DNA sequences used to assemble the human genome, when these
sequences came from different individuals or from different members of a chromosome pair from the
same individual. The average number of differences observed between the DNA of any two unrelated
individuals represented at 1 percent or more in the population is one difference in every 1,300 bases; this
leads to the estimation that individuals differ from one another at 2.4 million places in their genomes.^111
In rare cases, a single SNP has been directly associated with a medical condition, such as sickle cell
anemia or cystic fibrosis. However, most common diseases such as diabetes, cancer, stroke, heart dis-
ease, depression, and arthritis (to name a few) appear to have complex origins and involve the partici-
pation of multiple genes along with environmental factors. For this reason there is interest in identifying
those SNPs occurring across the human genome that might be correlated with common medical condi-
tions. SNPs found within exons that contain genes are of greatest interest because they are believed to be
potentially related to changes in proteins that affect a predisposition to disease, but because most of the
genome does not code for proteins (and indeed a number of noncoding SNPs have been found^112 ), the
functional impact of many SNPs is unknown.
Armed with rapid DNA sequencing tools and the ability to detect single-base differences, an inter-
national consortium looked for SNPs in individuals over the last several years, ultimately identifying
more than 3 million unique SNPs and their locations on the genome in a public database. SNP maps of
the human genome with a density of about one SNP per thousand nucleotides have been developed. An
effort under way in Iceland known as deCODE seeks to correlate SNPs with human diseases.^113 How-
ever, determining which combinations of the 10 million SNPs are associated with particular disease
states, predisposition to disease, and genes that contribute to disease remains a formidable challenge.
Some research on this problem has recently on focused on the discovery that specific combinations
of SNPs on a chromosome (called “haplotypes”) occur in blocks that are inherited together; that is, they


(^107) D. Posada and K.A. Crandall, “Intraspecific Gene Genealogies: Trees Grafting into Networks,” Trends in Ecology and Evolu-
tion 16(1):37-45, 2001.
(^108) L.L. Cavalli-Sforza and M.W. Feldman, “The Application of Molecular Genetic Approaches to the Study of Human Evolu-
tion,” Nature Genetics 33 (Suppl.):266-275, 2003.
(^109) S.B. Gabriel, S.F. Schaffner, H. Nguyen, J.M. Moore, J. Roy, B. Blumenstiel, J. Higins, et al., “The Structure of Haplotype
Blocks in the Human Genome,” Science 296(5576):2225-2229, 2002.
(^110) L. Kruglyak and D.A. Nickerson, “Variation Is the Spice of Life,” Nature Genetics 27(3):234-236, 2001, available at http://
nucleus.cshl.edu/agsa/Papers/snp/Kruglyak_2001.pdf.
(^111) The International SNP Map Working Group, “A Map of Human Genome Sequence Variation Containing 1.42 Million Single
Nucleotide Polymorphisms,” Nature 409:928-933, 2001.
(^112) See, for example, D. Trikka, Z. Fang, A. Renwick, S.H. Jones, R. Chakraborty, M. Kimmel, and D.L. Nelson, “Complex SNP-
based Haplotypes in Three Human Helicases: Implications for Cancer Association Studies,” Genome Research 12(4):627-639, 2002.
(^113) See http://www.decode.com.

Free download pdf