Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
If no such panel is available for the population from which cases
were derived, check if there are other epidemiological studies that
included population-based controls with phenotypic information
for whom DNA may already have been collected.


  1. Recruit healthy control samples and disease-affected case sam-
    ples with available information of their ethnicity, age, sex,
    physical condition, and geographical area (see Note 4). A
    large sample size is required in genome-wide association stud-
    ies (seeNote 5).

  2. Extract the genomic DNA of all case and control individuals.

  3. Conduct genome-wide genotyping analysis by using chip-
    based microarray technology to assay 1 million or more
    SNPs. In some of the arrays, if particular variants of interest
    are missing from the panel, such as less common or rare var-
    iants, user is allowed to add an additional 10,000 or 50,000
    single-nucleotide variants. Two main platforms are used for
    most GWAS: Affymetrix and Illumina platforms (seeNote 1).
    You can skipsteps 2and 3 if you have accessed to samples that
    are already genotyped (seeNote 6).

  4. Compare case group and control group and collect the SNPs
    that are significant (i.e.,P-value<1e-7) (seeNote 7).


3.3 Quality Control
for Individuals
and SNPs


Identifying and excluding poor quality samples are valuable to
avoid having errors in the data that might lead to false-positive or
false-negative associations. The samples that had a success rate less
than some level, maybe at the 95% of the SNPs, are successful. The
more SNPs that fail, the more SNPs that succeed are called into
questions as to perhaps be generating inaccurate genotypes. It
could be that heterozygotes are being miscalled as homozygotes
for particular alleles. An excess of heterozygous genotypes suggest
that a DNA sample might be a mixture of two DNA samples. Here
we overview the QC issues for the samples and SNP-based geno-
typing methods used in GWAS.


  1. Screen the samples with (1) per-individual call rate>80~90%
    and (2) per-individual heterozygosity between 23 and 30%, and
    remove those which are not qualified.

  2. Screen the SNPs that were generated with (1) per-SNP call
    rate>90% (seeNote 8), (2) minor allele frequency (MAF)>3%
    (seeNote 9), and (3) Hardy-Weinberg equilibrium (HWE)
    which existed in both controls and cases (seeNote 10).
    (a) One can detect SNPs that are of poor quality by looking
    for a genotyping success rate less than 95%, which is a
    threshold commonly used; often the analyses are done
    using a small percentage of samples that are duplicated
    and present twice within the set of samples being


102 Michelle Chang et al.

Free download pdf