5.10 Single Nucleotide Polymorphisms 125
nent of the Cancer Genome Anatomy Project (Packer et al. 2004) of the Na-
tional Cancer Institute (NCI). SNP500Cancer provides bidirectional sequenc-
ing information on a set of control DNA samples derived from anonymized
subjects (102 Coriell samples representing four self-described ethnic groups:
African/African-American, White, Hispanic, and Pacific Rim). All SNPs are
chosen from public databases and reports, and the choice of genes includes
a bias toward nonsynonymous and promoter SNPs in genes that have been
implicated in one or more cancers. The website is searchable by gene, chro-
mosome, gene ontology pathway, and by known dbSNP ID. For each ana-
lyzed SNP, the database includes the gene location and over 200 bp of sur-
rounding annotated sequence (including nearby SNPs). Other information is
also provided such as frequency information in total and per subpopulation
and calculation of the Hardy-Weinberg equilibrium for each subpopulation.
Sequence validated SNPs with minor allele frequency greater than 5% are en-
tered into a high-throughput pipeline for genotyping analysis to determine
concordance for the same 102 samples. The website provides the conditions
for validated genotyping assays.
SeattleSNPs Database pga.mbt.washington.edu
The SeattleSNPs is a collaboration between the University of Washington and
the Fred Hutchinson Cancer Research Center, funded as part of the National
Heart Lung and Blood Institute’s (NHLBI) Programs for Genomic Applica-
tions (PGA). The goal of SeattleSNPs is to discover and model the associa-
tions between single nucleotide sequence differences in the genes and path-
ways that underlie inflammatory responses in humans. In addition to SNP
data (location, allele frequency, and function for coding SNPs), haplotypes
are presented graphically on the SeattleSNPs website. Haplotype tagging
SNPs (htSNPs) information is also provided that will allow fewer SNPs to be
genotyped per gene, thereby reducing cost and improving throughput. Data
is available in tab-delimited text files.
GeneSNPs http://www.genome.utah.edu/genesnps
The GeneSNPs database is sponsored by the National Institute of Environ-
mental Health Sciences and is being developed by the University of Utah
Genome Center. GeneSNPs is a component of the Environmental Genome
Project which integrates gene, sequence, and polymorphism data into indi-
vidually annotated gene models. The human genes included are related to
DNA repair, cell cycle control, cell signaling, cell division, homeostasis and
metabolism, and are thought to play a role in susceptibility to environmen-
tal exposure. Data are available in HTML, FASTA, and XML formats. The