Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
associations” [11]. The replication samples should ideally be
larger to be well powered to identify spuriously associated
SNPs (seeNote 13).


  1. The guidelines to conduct replication studies were suggested
    by NCI-NHGRI Working Group on Replication in Association
    Studies [12].


3.6 Statistical
Analysis


Larger sample sizes will have a greater possibility of identifying
genetic factors that have a more modest effect. The common way
is that, for instance, each group does their own GWA analysis, and
then the data from several studies is combined together by
performing a meta-analysis of the results for each genetic variant.
To obtain a statistical significant result, we will show a typical
procedure for GWAS.


  1. Create the Manhattan plot by HaploView v4.1 [13] or the
    quantile-quantile (Q-Q) plot generated by R 2.8.1 for data
    visualization, which can provide a visual summary of associa-
    tion test results that draws immediate attention to any signifi-
    cant regions.
    Q-Q plot: One way to evaluate whether there is an excess of
    significant results at a given threshold is to plot theP-values
    that result from the test of association against theP-values from
    a uniform distribution. The correction for population stratifi-
    cation can reduce the excess result and excess associations that
    are false positive and that are not due to true genetic signals.

  2. Perform fine mapping around the newly identified susceptibil-
    ity gene locus by genotyping tag SNPs and performing impu-
    tation [14](seeNote 14).

  3. Meta-analysis is useful for the replication of initial association
    results, and can increase power and opportunity to identify
    novel signals associated with a disease. When performing
    meta-analysis, one has to concern about heterogeneity between
    the studies. For example, when the WTCCC performed a
    GWAS of T2D, they showed strong evidence of association of
    variants at the FTO locus. However, a couple of other studies
    that were doing association analysis of T2D at the same time
    did show the same result. It is because the WTCCC were more
    obese than the controls in that study, whereas in the other
    diabetes studies, their case-control selection had been more
    balanced with respect to body size. Identifying this source of
    heterogeneity between the studies led to the identification of
    this BMI gene.
    (a) Meta-analysis can be conducted by PLINK to combine
    multiple data from GWA studies and provide a quantita-
    tive evaluation of the consistency/inconsistency or
    heterogeneity of the results across multiple datasets (see
    Note 15).


104 Michelle Chang et al.

Free download pdf