Computational Systems Biology Methods and Protocols.7z

concentration, temperature, and excessive dilution. In general, spectra are initially aligned to an internal standard to remove the global shifts. But this is not enough because there are still shifts of peaks in one spectrum, so a refine or local alignment should be conducted for peaks in one spectrum. For this purpose, various alignment algorithms, including interval correlated shifting (ico- shift) [91], correlation optimized warping (COW) [92], fuzzy warping, and hierarchical cluster-based peak alignment (CluPA) [93], are proposed. The NMR spectra are divided into segments (so-called bins or buckets), and the spectral intensity within each bin is calculated. Binning attempts to minimize effects from varia- tions in peak positions caused by the pH, ionic strength, and composition in a sample. Binning can reduce the dimensionality for multivariate statistical analysis but decreases the spectral resolu- tion. The most common type of spectral binning is equidistant binning with 0.04 ppm. This indicates that every spectrum is divided into evenly spaced integral regions with spectral width of 0.04 ppm. Sometimes a peak is split between two bins, for equal size binning is the lack of flexibility of the boundaries. To prevent peak splitting, some new binning methods are proposed [94–96]. As for peak normalization, constant sum normalization (CSN) is the most commonly used method, in which the total spectral peak area is considered as a constant across. In addition, histogram matching (HM) normalization, volume normalization, and weight normalization (dry weight/wet weight) are also used in metabolomics studies [55]. After normalization, the scaling step is conducted to avoid the situation where the highest peaks are the most influential in the multivariate data models. Currently, several scaling approaches, such as auto scaling, Pareto scaling, mean cen- tering, and range scaling, are proposed. Among these approaches, the Pareto scaling is the most commonly used approaches, where each variable is divided by the square root of the standard deviation of the column values. Although a number of metabolites are detected by the 1D^1 H NMR methods, the identification of metabolites in biological sam- ples is still a huge challenge because many peaks tend to overlap in 1D^1 H NMR spectra. This problem can be solved by the 2D^1 H-^1 H TOCSY and 2D^1 H-^13 C HSQC approaches. After spectral data preprocessing, the known metabolites is characterized by querying libraries of NMR spectra (1D and 2D). Now, many NMR spectra databases have been constructed, including the Human Metabo- lome Database (HMDB) [97, 98], the Biological Magnetic Reso- nance Data Bank (BMRB) [99], the Madison-Qingdao Metabolomics Consortium Database (MQMCD) [100], and the MetaboLights database [101]. Various forms of queries can be performed to extract metabolite information from these databases. Users can submit structure, chemical shift, orJ-coupling constant of metabolites to these databases for query, and relevant biological

280 Jing Cheng et al.

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources