5.10 Single Nucleotide Polymorphisms 123
5.10 Single Nucleotide Polymorphisms
Asingle nucleotide polymorphism(SNP) is defined as a single base change that
occurs at a population frequency of at least 1%. SNPs represent the most
common form of variation in the human genome. SNPs are important land-
marks that can be applied in studies of molecular evolution as well as disease
mechanisms.
In contrast to rare Mendelian diseases caused mostly by high-penetrant
mutations, low-penetrance SNPs appear to form the most essential compo-
nent of the heritability of common, complex human diseases. Bioinformat-
ics has provided an unprecedented power and resource for deciphering the
enigma of such complex disorders, based on the tremendous amount of data
generated by the new, powerful, and high-throughput technologies of ge-
nomics and proteomics (Leung and Pang 2002). Several programs have been
developed to predict SNP effects in silico on protein functions/gene tran-
scriptional activities (Krishnan and Westhead 2003; Ng and Henikoff 2002,
2003; Conde et al. 2004). Also, there has been a surging interest in study-
ing complex human diseases using SNP-based haplotypes, and a number of
haplotype phasing algorithms have been developed (Niu 2004).
This section describes the major SNP and haplotype databases. For a list of
the databases in this area, see the HGVbase website athgvbase.cgb.ki.
se/.
NCBI dbSNP database http://www.ncbi.nlm.nih.gov/SNP
The NCBI dbSNP database is the central depository for SNPs (Sherry et al.
2001). Because dbSNP entries may contain redundancies, all SNPs contained
in dbSNP have been grouped into nonredundant sets of SNPs by clustering
SNPs at identical genomic coordinates to create single, representative SNPs,
which are called reference SNPs (RefSNPs). These RefSNPs are designated
with anrsprefix in the ID.
Data are available in a wide variety of formats, including flat files, ASN.1,
FASTA, and XSD. The URI for the XSD schema isftp://ftp.ncbi.nlm.
nih.gov/snp/specs/genoex.xsd.
HGVbase hgvbase.cgb.ki.se/
The objective of the Human Genome Variation Database is to provide an ac-
curate, high-utility, and ultimately fully comprehensive catalog of normal
human gene and genome variation, useful as a research tool to help de-
fine the genetic component of human phenotypic variation. All records are
highly curated and annotated, ensuring maximal utility and data accuracy.