genotyped. The genotyping results from the duplicate
samples are compared, and mismatches between the iden-
tical samples are bad for a SNP.
(b) Do a test for HW equilibrium and look for the expected
proportions of genotype frequencies which are not con-
sistent with the observed allele frequencies. These statisti-
cal tests can be used to identify the errors. If there are
related individuals within samples such as parents and a
child, trios, then one can look for Mendelian inheritance
of alleles from the parents to the child.
(c) Some groups will add additional quality control samples
to their genotyping to allow specific type of SNP error to
be detected.
(d) Avoid sample switches that can happen during the process
of moving DNA samples to be genotyped. Use genotype
data to evaluate whether the sex of the sample matches the
expected sex of the individual.
(e) A fully automated pipeline for analysis and reporting of
QC results for Illumina SNP data is available athttp://
www-personal.une.edu. au/~cgondro2/CGhomepage.
3.4 Adjustment for
Multiple Testing
A commonly used threshold of significance assumes the number of
common variants being tested across the population. AP-value
threshold of 0.5 declares that a particular result is significant.
Achieving a threshold like this requires either a large effect of that
particular variant or a large sample size to detect a more modest
effect. Here we will discuss the multiple testing methods for an
adjustment in GWAS.
- Perform Bonferroni correction for multiple testing (seeNote
11 ). If the SNP markers withP-values are less than 0.05 after
Bonferroni correction, they are then considered significant of
the association between the SNPs and the traits of the disease
(seeNote 12). - Determine the false discovery rate (FDR) to estimate the pro-
portion of significant results (usually at alpha¼0.05) that are
false positives and maintain the true results. - Perform permutation testing by software packages such as
PLINK software or PRESTO to generate the empirical distri-
bution of test statistics for a given dataset. - You can also obtain per-SNP significance thresholds for a given
family-wise error rate (FWER) from Hoggart et al. [10].
3.5 Design
Replication Studies
- After the SNPs that might be susceptible sites of the disease are
screened in the above steps, replication studies are needed to
distinguish between “statistical artifacts” and “true
An Overview of Genome-Wide Association Studies 103