It is necessary to use the notion of expectation when considering relationships between
samples and populations because the expectation of a sample statistic is used to estimate
the corresponding population parameter. Sometimes a population parameter is simply the
expected value of the corresponding sample statistic. This is the case with the mean. In
notational form this is and in words this states that the population mean is
equal to the expected value of the sample average. Here the sample average is an
unbiased point estimator. A non-mathematical explanation of expectation is that if
samples were to be repeatedly drawn at random many times from a population (with
replacement) then the average of these sample averages would equal the population
mean. That is in the long run the average of these sample averages is the expected
value,
When choosing a statistic as an estimator of a parameter four properties are desirable.
- The statistic should be unbiased. An unbiased statistic is an estimator that has an
expected value equal to the parameter to be estimated. The sample mean is an
unbiased estimator of the corresponding population parameter. - The statistic should be efficient. An efficient statistic is one that is a better estimator in
all respects than any other statistic. Both the median and the mean are unbiased
estimators of the population parameter Ī¼, but the mean is more efficient. If we select
repeated random samples of equal size from a defined population and plot the
averages of each sample and the medians of each sample we would find that the
averages cluster closer around the population mean than do the medians. The sample
average is therefore more efficient because any average is, in the long run, more likely
to be closer to the population mean than a sample median. - The statistic should be sufficient. A sufficient statistic is one which uses the maximum
amount of relevant sample information. The sample range uses only two values in a
distribution whereas the variance and standard deviation uses all the values. Similarly
the mean uses all the values but the mode uses only the most common observations.
The mean and variance are more sufficient statistics than the mode and range. - The statistic should be resistant. A resistant statistic is the degree to which a statistic is
influenced by extreme values in a distribution. As we have mentioned the mean is
greatly influenced by extreme values whereas the median is relatively uninfluenced.
The median is more resistant than the mean.
You might think that all sample statistics are unbiased estimates of their corresponding
parameters but this is not true. The sample variance and standard deviation are biased
estimates of their respective population parameters. That is why the denominators in the
formulae 3.1 and 3.2 are corrected by subtracting one from the sample size, i.e., nā1. This
is the degree of freedom associated with the statistic. One degree of freedom is lost for
every parameter estimated from sample data and one degree of freedom is gained for
every independent observation. These are important considerations when designing
studies and choosing possible statistical models. We will return to these issues when we
consider regression and analysis of variance.
Statistical analysis for education and psychology researchers 76