untitled

(ff) #1

124 5 Survey of Ontologies in Bioinformatics


HGVbase is the product of a collaboration between the Karolinska Institute
(Sweden), and the European Bioinformatics Institute (U.K.). Recently, a de-
cision was made to develop HGVbase into a phenotype/genotype database.
Data exchange with other databases is being maintained, but submissions
are not currently being accepted.
Database exchange of core information with dbSNP (Sherry et al. 2001)
ensures that HGVbase incorporates data from high-throughput discovery
efforts. Release 15 of HGVbase contains information on almost 3 million
SNPs, of which 29,000 are found in 10,000 genes and 41,000 have allele fre-
quency information. In HGVbase, the location of each represented variant
is presented in the context of available gene predictions, and SNPs within
or around genes are described asexonic,intronic,utr,orflank(within 2 kb
of the gene boundary). HGVbase currently considers only genes with a
HUGO nomenclature committee approved definition (Wain et al. 2002), as
represented in the Ensembl database (Hubbard et al. 2002). Nonsynonymous
SNPs are grouped into three broad classes based on their predicted effect on
the protein level:benign,possibly damaging,andprobably damaging.Themeth-
ods used for these functional predictions are described in (Ng and Henikoff
2003; Ramensky et al. 2002).
HGVbase is available in XML, FASTA, MySQL, and flat file formats. The
XML format is specified by the XML DTD atftp://ftp.ebi.ac.uk/pub/
databases/variantdbs/hgbase/hgvbase.dtd.
Ensembl http://www.ensembl.org/
Ensembl is a a comprehensive source of stable automatic annotation of in-
dividual genomes, and of the synteny and orthology relationships between
them (Birney et al. 2004). It is also a framework for integration of any bi-
ological data that can be mapped onto features derived from the genomic
sequence, including SNPs.
Data can be obtained in a variety of formats, including FASTA format, flat
files, GenBank format, and MySQL database dump format. The flat file for-
mat does not include all the data.
SNP500Cancer snp500cancer.nci.nih.gov
The Cancer Genome Anatomy Project (CGAP) was designed to provide pub-
lic data sets, material resources, and informatics tools to serve as a plat-
form to support the elucidation of the molecular signatures of cancer (Straus-
berg 2001; Strausberg et al. 2001). The SNP500Cancer Database provides se-
quence and genotype assay information for candidate SNPs useful in map-
ping complex diseases such as cancer. The database is an integral compo-
Free download pdf