Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

theoretical or empirical considerations and common sense. For example, an educational
researcher may want to know whether individual level variables, (for example, IQ,
gender, pre-school experience) and/or school level variables (for example, class in
school, measure of school resources, teacher experience) can explain pupil achievement.
A research psychologist may be interested in the relationship between recent life events
(stressors) and work performance. In a more formal way data from personality appraisals
can be used to predict vocational success and may be used, amongst other criteria, in
personnel selection and guidance (Bernadin and Bownas, 1985). If y, the observed value
of a response variable, is an outcome or effect, and x 1 , x 2 , are particular values of
explanatory variables (or causes), then just because a regression model containing y and
x 1 , x 2 fit the data this does not mean that x 1 , x 2 are the only cause or explanation of y.
There may be other important explanatory variables not in the model. Emphasis in
regression analysis should therefore be on model comparison and choice of the most
appropriate regression model. A decision on the most appropriate model will be based on
statistical results, theoretical and empirical considerations and common sense. It is
possible to have a well fitted (statistically) regression model which is nonsense. All to
common bad practice includes the fitting of a regression model to sparse data (too few
data points) and interpretation of parameter estimates based on values of explanatory
variables which are beyond the range of explanatory variables in the sample data. We
should also not assume that there is only one ‘correct’ statistical model. For discussion of
model uncertainty and statistical inference see Chatfield (1995).


Statistical Inference and Null Hypothesis

Researchers are often interested in determining whether there is a relationship between a
response and explanatory variable. In effect this is a test of the hypothesis to determine
the predictive ability of the regression model. To determine the utility of the model we
test whether the regression slope is zero. The null hypothesis is, H 0 : β 1 =0, and the
alternative hypothesis is that the explanatory variable makes a significant contribution to
the prediction of the response variable Y, namely H 1 : β 1 ≠0. Inferences are based on the
sampling distribution of the regression statistic b 1 which is used to estimate the
population regression parameter β 1. The t-distribution with n−2 degrees of freedom is
used to evaluate the test statistic b 1.
A second hypothesis is sometimes tested, whether the intercept is zero, but this is
generally of less interest. The null hypothesis and alternative hypotheses are in this case,
H 0 : β 0 =0 and H 1 : β 0 ≠0. When there is more than one independent variable, the overall
model fit is evaluated with the F statistic. The null hypothesis tested involves all
regression parameters except the intercept. For example if there were three explanatory
variables in the model then the null hypothesis would be: H 0 : β 1 =β 2 =β 3 =0. The alternative
hypothesis would be that at least one of the parameters is zero. The F statistic is
evaluated as the ratio of the mean square for model to mean square for error (see worked
example).


Statistical analysis for education and psychology researchers 256
Free download pdf