Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

One purpose of IDA is, as Chatfield (1993) comments, ‘to help you do a “proper”
analysis “properly”’ (p. 46). Even if a possible statistical model is identified a priori from
previous theoretical or empirical considerations, IDA should be used to check the
structure of the data and to identify whether variables are discrete or continuous and
hence to confirm plausible underlying models, such as binomial, normal, bivariate normal
(for correlations). It is good practice to use as much relevant data as possible and not to
collapse variables and thereby transform them to lower levels of measurement.
Outlier observations can have a drastic effect on statistical tests. For example, t-tests
are sensitive to extreme skewness and outlier observations especially with small sample
sizes. When fitting data to a statistical model the effect of inclusion and exclusion of
outliers should be checked.
Scatterplots of one variable against another will indicate, visually, the extent of
relationship between two variables and whether it is linear (a straight line can be drawn
through the cloud of points). Linearity is a necessary assumption for some statistical
procedures, such as linear regression and correlation. Histograms and stem and leaf plots
provide information on the distribution of variables and more sophisticated procedures
for testing assumptions of normality, whether a variable is normally distributed in the
population, are presented at the end of this chapter.
It is a common misunderstanding amongst new researchers that a criterion variable of
interest has to be normally distributed in your achieved sample. For example, in a two
sample t-test it is often believed that the criterion variable should be normally distributed
in the two samples. This is not necessary. The general parametric model makes the
assumption that the variable (or more accurately the errors) are distributed normally in
the population (not necessarily in your sample). More important is homogeneity of
variance in both samples.
Specific procedures for checking statistical test assumptions or more correctly, the
assumptions underlying the probability model, are presented when use of the statistical
procedures are introduced in the following chapters. For now it is sufficient to note that
you should include checks for violations of assumptions in your inferential analysis.
Checks on these assumptions can be considered as an extension of IDA and as a
preliminary to inferential analysis. These checks usually take the form of residual plots.
A residual represents the difference between an observed and an expected value based on
the statistical model that is used.
Statistical test assumptions often have to be made by the reader in the absence of
evidence presented by author(s). Given the robustness of parametric procedures this is
not usually seen to be a problem. This view, however, is not without its critics and the
issue of robustness is discussed in a later section. Regardless of whether or not you view
many parametric tests as being robust, it is wise when interpreting results, to remember
that the attained statistical significance (as stated in many articles and papers) is
dependent upon the validity of the assumptions relating to the statistical model being
used (not usually stated in journal papers). You should consider whether the underlying
statistical model is appropriate, as well as the statistical power of the tests used. All too
often assumptions are made implicitly which on closer scrutiny, for example, using IDA,
would render the statistical tests invalid and the conclusions spurious. Examples that have
appeared in the literature include: Reporting Pearson correlations when the relationship is
clearly non-linear, or when variables are clearly categorical, reporting t-tests for


Statistical analysis for education and psychology researchers 120
Free download pdf