Statistical Methods for Psychology

(Michael S) #1
display boxplots of lnSymptoms for each quintile of the Stress variable. This plot is shown
in Figure 9.4b. Given the fact that we only have about 20 data points in each quintile,
Figure 9.4b reflects the reasonableness of our assumptions quite well.
To anticipate what we will discuss in Chapter 11, note that our assumptions of homo-
geneity of variance and normality in arrays are equivalent to the assumptions of homogene-
ity of variance and normality of populations that we will make in discussing the analysis
of variance. In Chapter 11 we will assume that the treatment populations from which data
were drawn are normally distributed and all have the same variance. If you think of the
levels of Xin Figure 9.4a and 9.4b as representing different experimental conditions, you
can see the relationship between the regression and analysis of variance assumptions.
The assumptions of normality and homogeneity of variance in arrays are associated
with the regression model, where we are dealing with fixed values of X. On the other hand,
when our interest is centered on the correlation between Xand Y, we are dealing with the
bivariate model, in which Xand Yare both random variables. In this case, we are primarily
concerned with using the sample correlation (r) as an estimate of the correlation coefficient
in the population (r). Here we will replace the regression model assumptions with the
assumption that we are sampling from a bivariate normal distribution.
The bivariate normal distribution looks roughly like the pictures you see each fall of sur-
plus wheat piled in the main street of some Midwestern town. The way the grain pile falls off
on all sides resembles a normal distribution. (If there were no correlation between Xand Y,
the pile would look as though all the grain were dropped in the center of the pile and spread
out symmetrically in all directions. When Xand Yare correlated the pile is elongated, as
when grain is dumped along a street and spreads out to the sides and down the ends.) An ex-
ample of a bivariate normal distribution with r 5 .90 is shown in Figure 9.5. If you were to
slice this distribution on a line corresponding to any given value of X, you would see that
the cut end is a normal distribution. You would also have a normal distribution if you sliced
the pile along a line corresponding to any given value of Y. These are called conditional
distributionsbecause the first represents the distribution of Ygiven (conditional on) a spe-
cific value of X, whereas the second represents the distribution of Xconditional on a specific
value of Y. If, instead, we looked at allthe values of Yregardless of X(or all values of Xre-
gardless of Y), we would have what is called the marginal distributionof Y(or X). For a
bivariate normal distribution, both the conditional and the marginal distributions will be nor-
mally distributed. (Recall that for the regression model we assumed only normality of Yin

Section 9.8 Assumptions Underlying Regression and Correlation 265

X 1 X 2 X 3 X 4
X

Y SY^22
S^2
Y 1

S^2
Y 4
S^2
Y 3

First

4.2

InSymptoms

4.4

4.6

4.8

5.0

Second
Quintiles of Stress

Third Fourth Fifth

Figure 9.4 a) Scatter diagram illustrating regression assumptions; b) Similar plot for the data on Stress
and Symptoms

conditional
distributions


marginal
distribution

Free download pdf