Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
76 CATALYZING INQUIRY

The CCDB contains structural and protein distribution information derived from confocal, mul-
tiphoton, and electron microscopy, including correlated microscopy. Its main mission is to provide a
means to make high-resolution data derived from electron tomography and high-resolution light mi-
croscopy available to the scientific community, situating itself between whole brain imaging databases
such as the MAP project^47 and protein structures determined from electron microscopy, nuclear mag-
netic resonance (NMR) spectroscopy, and X-ray crystallography (e.g., the Protein Data Bank and EMBL).
The CCDB serves as a research prototype for investigating new methods of representing imaging
data in a relational database system so that powerful data-mining approaches can be employed for the
content of imaging data. The CCDB data model addresses the practical problem of image management
for the large amounts of imaging data and associated metadata generated in a modern microscopy
laboratory. In addition, the data model has to ensure that data within the CCDB can be related to data
taken at different scales and modalities.
The data model of the CCDB was designed around the process of three-dimensional reconstruction
from two-dimensional micrographs, capturing key steps in the process from experiment to analysis.
(Figure 4.1 illustrates the schema-entity relationship for the CCDB.) The types of imaging data stored in
the CCDB are quite heterogeneous, ranging from large-scale maps of protein distributions taken by
confocal microscopy to three-dimensional reconstruction of individual cells, subcellular structures, and
organelles. The CCDB can accommodate data from tissues and cultured cells regardless of tissue of
origin, but because of the emphasis on the nervous system, the data model contains several features
specialized for neural data. For each dataset, the CCDB stores not only the original images and three-
dimensional reconstruction, but also any analysis products derived from these data, including seg-
mented objects and measurements of quantities such as surface area, volume, length, and diameter.
Users have access to the full resolution imaging data for any type of data, (e.g., raw data, three-
dimensional reconstruction, segmented volumes), available for a particular dataset.
For example, a three-dimensional reconstruction is viewed as one interpretation of a set of raw data
that is highly dependent on the specimen preparation and imaging methods used to acquire it. Thus, a
single record in the CCDB consists of a set of raw microscope images and any volumes, images, or data
derived from it, along with a rich set of methodological details. These derived products include recon-
structions, animations, correlated volumes, and the results of any segmentation or analysis performed
on the data. By presenting all of the raw data, as well as reconstructed and processed data with a
thorough description of how the specimen was prepared and imaged, researchers are free to extract
additional content from micrographs that may not have been analyzed by the original author or employ
additional alignment, reconstruction, or segmentation algorithms to the data.
The utility of image databases depends on the ability to query them on the basis of descriptive
attributes and on their contents. Of these two types of query, querying images on the basis of their
contents is by far the most challenging. Although the development of computer algorithms to identify
and extract image features in image data is advancing,^48 it is unlikely that any algorithm will be able to
match the skill of an experienced microscopist for many years.
The CCDB project addresses this problem in two ways. One currently supported way is to store the
results of segmentations and analyses performed by individual researchers on the data sets stored in the
CCDB. The CCDB allows each object segmented from a reconstruction to be stored as a separate object
in the database along with any quantitative information derived from it. The list of segmented objects
and their morphometric quantities provides a means to query a dataset based on features contained in
the data such as object name (e.g., dendritic spine) or quantities such as surface area, volume, and
length.


(^47) A. MacKenzie-Graham, E.S. Jones, D.W. Shattuck, I. Dinov, M. Bota, and A.W. Toga, “The Informatics of a C57BL/6 Mouse
Brain Atlas,” Neuroinformatics 1(4):397-410, 2003.
(^48) U. Sinha, A. Bui, R. Taira, J. Dionisio, C. Morioka, D. Johnson, and H. Kangarloo, “A Review of Medical Imaging Informatics,”
Annals of the New York Academy of Sciences 980:168-197, 2002.

Free download pdf