explanation until a later section, this should not affect your grasp of how the basic ideas
of statistical inference and probability fit into the research process.
EstimationLet us return again to the research process outlined in Figure 4.1. After having defined a
population of interest and specified research questions in terms of the variables to be
measured, the researcher then selects a sample, where possible a random probability
sample, from the population of interest (step 2). Both the concept of probability and the
idea of sampling variability are involved when a random sample is chosen from a
population. Sampling variability is sometimes referred to as sampling error when
referring to survey designs.
Random sampling in survey research is based on the idea of probability or chance, that
is, in a random sample each member of the target population has a known chance of
being selected into the sample. If an experiment was planned, the principle of
randomization would be appropriate, (see, for example, experimental design in Chapter
1). When a researcher selects a random sample, calculates a statistic such as a mean and
then goes beyond the descriptive function of the statistic to use it to determine the
population mean, this represents another aspect of statistical inference called estimation.
Put simply, estimation is when we use sample statistics to estimate the value of
population parameters. The formulae we use to calculate the statistic is called the
estimator.
For example, with reference to the research question at the beginning of this chapter
about the relationship between academic performance in primary and secondary school,
we could use a sample statistic the Pearson correlation, r (a measure of relationship
between two variables), to estimate the population correlation, ρ (rho). Any one sample
that is chosen randomly is very unlikely to be identical to another independent random
sample selected from the same population. It follows that if, for example, correlations
were calculated for two independent samples it is unlikely that they would be the same.
This is because of sampling variability, also called sampling error (see Chapter 1, section
1.1). Sampling error is a feature of quantitative empirical research studies and this
variability needs to be estimated so that we can tell how good an estimator any one
sample correlation is. The sampling error or standard error of a statistic also plays an
important role in some statistical tests for example the t-test.
Rather than using the statistic, r, to estimate the population correlation, we could use it
to test a hypothesis, for example, ‘Is the population correlation between primary and
secondary school performance equal to zero?’ Once again, using the idea of probability,
we can state with a specified degree of certainty whether it is reasonable to believe that
the population parameter is zero. We could of course propose that the population
parameter is some other non-zero value. In reality the population parameter is likely to be
some true non-zero value but we do not know this. The logic of hypothesis testing
demands that we assume the population correlation is zero and that we accumulate
evidence to refute this conjecture. The reason why we set about testing hypotheses in this
strange and convoluted way will be explained in section 4.7 (hypothesis testing).
Statistical analysis for education and psychology researchers 88