Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

linear and both variables are normally distributed. Correlations should always be
examined prior to more sophisticated multivariate analyses such as factor analysis or
principal component analysis. The extent of a linear relationship between two variables
may be difficult to judge from a scatterplot and a correlation coefficient provides a more
succinct summary. However, it would be unwise to attempt to calculate a correlation
when a scatterplot depicted a clear non-linear relationship. When a researcher is
interested in both the extent and the significance of a correlation then r is used in an
inferential way as an estimate of the population correlation, ρ (rho).


Statistical Inference and Null Hypothesis

As well as estimating the size of the population correlation we may want to test whether
it is statistically significant. In testing this hypothesis the same logic is followed as that
described in Chapter 7 when testing the significance of a nonparametric correlation. The
null hypothesis is H 0 : ρ=0, that is, the variable X is not linearly related to the variable Y.
The alternative hypothesis is H 1 : ρ≠0. The null hypothesis is a test of whether any
apparent relationship between the variables X and Y could have arisen by chance. The
sampling distribution of r is not normal when the population correlation deviates from
zero and when sample sizes are small (n<30). For tests of significance r is transformed to
another statistic called Fisher’s z (which is not the same as the Z deviate for a normal
distribution).


Assumptions

In some statistical texts for social scientists it is asserted that to use the Pearson
correlation both variables should have a normal distribution, yet in other texts it says that
the distributions of both variables should be symmetrical and unimodal but not
necessarily normal. These ideas cause great confusion to researchers and need to be
clarified. If the correlation statistic is to be used for descriptive purposes only, then
normality assumptions about the form of the data distributions are not necessary. The
only assumptions required are that



  • quantitative measures (interval or ratio level of measurement) are taken simultaneously
    on two or more random variables;

  • paired measurements for each subject are independent.


The results obtained would describe the extent to which a linear relationship would apply
to the sample data.
This same idea applies to the descriptive use of regression statistics. Should the
researcher wish to make any inference about the extent of a population linear relationship
between two variables or in a regression context to make a prediction which went beyond
the sample data, the following assumptions should be met:



  • Two random variables should be linearly related, but perfect linearity is not required as
    long as there is an obvious linear trend indicated by an elliptical scatter of points
    without any obvious curvature (look at the scatterplot).

  • The underlying probability distribution should be bivariate normal, that is the
    distribution of the variable X and the distribution of the variable Y should be normal


Inferences involving continuous data 281
Free download pdf