Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1
3.6 Statistical Models

Whereas the primary aim of IDA is to describe and summarize data a secondary aim is to
suggest an appropriate underlying statistical model for the data which will form the basis
of subsequent inferential statistical analysis. I follow Chatfield’s thinking here (Chatfield,
1985) and stress that IDA may be all that is required particularly if the entire population
of interest is analyzed rather than a sample. Another situation when IDA would suffice is
when data is of such poor quality that further inferential analysis would not be justified,
for example, when it was evident that there were non-random errors. Sometimes visual
scrutiny of the data and descriptive analysis is so clear-cut that further inferential
statistical analysis is unnecessary. If none of the above situations arise then the researcher
should indeed consider what statistical models might be appropriate for further data
analysis.
A statistical model is a mathematical representation of the relationship between
variables in a population of interest or a mathematical expression for the shape of an
underlying population distribution of a variable. In reality, there may or may not be a
relationship between variables in a defined population. Similarly, a particular variable
which has the values 0 or 1, where 1 denotes ‘treatment success’ and 0 denotes ‘treatment
failure’ may or may not follow an underlying distribution such as the binomial
distribution or binomial model. (The binomial model, see Chapter 6, depends on the
underlying treatment success rate in the population of interest. If this were to change the
binomial model would not be appropriate.) We collect data, usually by sampling from the
population, to see if the data fits our simplified statistical model. If the data does not fit
the model, we change the model not the data.
As a further illustration consider once again the example of the vocabulary teaching
methods experiment introduced in Chapter 1, Example 2. You may recall from this
earlier teaching methods example that the researcher’s basic question was concerned with
vocabulary acquisition in a population of 6-year-olds, not simply vocabulary acquisition
in the sample itself. The use of sample statistics to estimate corresponding population
parameters (properties of populations rather than samples) is central in this way to
experimental design and statistical analysis.
We could argue that the vocabulary scores are subject to random, unsystematic
variation which makes them appear very much like random observations on a response
variable. The population formed by such a distribution of random observations is not real,
but can be thought of as a hypothetical population which would be generated. We could
put forward a statistical model to account for and explain this random variation. The
question we would need to consider is, Does this model yield an account of relationships
in the data? In other words, Does our data fit our statistical model?
To interpret data we search for patterns. Any systematic effects such as those
attributable to teaching method may be blurred by other more haphazard variation. A
statistical model contains both systematic and random effects. We can say that a general
probabilistic statistical model has two components, a deterministic or effect component
which represents the effects of variables in the model and a random or error component
which allows for random fluctuation of the variables in the model.
The value of a statistical model is that it should suggest a simple summary of the data,
using parameters, in terms of both systematic effects and random effects. The problem


Statistical analysis for education and psychology researchers 78
Free download pdf