Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1
Interpretation of the Test Statistics W:Normal (or D:Normal)

In SAS the normal option in PROC UNIVARIATE tests the null hypothesis that the
sample data represents a random sample from a normal distribution. Two statistical tests
may be performed by PROC UNIVARIATE depending upon the sample size. If the
sample size is >2000 then the Kolmogorov test is used, denoted as D:Normal in the SAS
output. With smaller samples the Shapiro-Wilk test is used, denoted by W:Normal in the
SAS output.
Interpretation of the hypothesis test is straightforward. If the obtained p-value from the
test statistic (either W:Normal or D:Normal) is less than the p-value you have chosen
(usually 0.05 or 0.15) then the null hypothesis is rejected and you can conclude that the
data do not come from a normal distribution.
Care should be taken when choosing your significance level. For small sample sizes
say<30 then a liberal p-value such as 0.15 is suggested. For larger sample sizes the p-
value of 0.05 is more appropriate. You should be aware that with large samples small
departures from normality will be detected with significance even when using a p-value
of 0.05. These small departures are generally of no practical consequence but you should
consider other information such as data plots and normal probability plots to determine
whether the sample data plausibly comes from a normal distribution.


Normal Probability Plot

This is a descriptive technique for checking normality in a data distribution. Normal
probability plots are produced by the option plot in PROC UNIVARIATE. Normal
probability plots are shown for the three score distributions in Figures 5.9, 5.10 and 5.11.
In a normal probability plot ranked data values are plotted (y axis) against standardized
expected values based on a normal distribution (x axis). When the data are normally
distributed any data value will equal its expected value and hence a plot will result in a
straight line.
Interpretation of the plot is straightforward. Data points are plotted by an and a
theoretically normal distribution is plotted by a + which forms a straightline. So if the
data is from a normal distribution the
will cover the + and form a straight line.
Therefore a small number of + and a correspondingly large number of forming a
straight line will indicate a normal distribution. See, for example, the normal probability
plot for the variable VOCAB in Figure 5.11. The approximate normal distribution of this
variable can be checked by examining the distribution (histogram), stem and leaf, and
box and whisker plots.
Departure from normality is evident when a number of + are visible and the data
values indicated by an
deviate from a straight line, see Figures 5.9 and 5.10. Notice in
both the variables CORRD and CORRE the do not cover most of the + signs and the
data values (
) deviate from a straight line. We could reasonably conclude that the data
for the percentage correct scores do not come from a normal distribution. The pattern of
the deviation of data points () provides a clue as to the shape of the underlying
distribution. The variable CORRE in Figure 5.10 has a negative or left skew and the data
points (
) form a curve from bottom left to top right rising steeply at first and then
flattening off. The variable CORRD in Figure 5.9 has a distinct right or positive skew, the
data points (*) form a curve from bottom left to top right but rising slowly and then


Choosing a statistical test 149
Free download pdf