2.3 Selecting Simple Random Samples 25
method is preferred on the computer because generating new random
numbers is faster than calculating new partitions. So although it looks
to be wasteful in practice, it is usually computationally faster.
Now for each patient, we are interested in a particular characteristic
that we can measure. For this example, we choose age in years at their
last birthday. Let us assume the ages for the patients are as follows:
A is 26
B is 17
C is 45
D is 70
E is 32
F is 9
The parameter of interest is the average age of the population. We
will estimate it using the sample estimate. Since the population is 6,
and we know the six values, the population mean, denoted as μ , is (26
- 1 7 + 4 5 + 7 0 + 3 2 + 9)/6 = 33.1667. In our example, we will not
know the parameter value because we will only see the sample of size
4 and will not know the ages of the two patients that were not selected.
Now, if we generated the random sample using the exact random
numbers that we got from the reject technique, we would have {B, C,
E, F} as our sample, and the sample mean will be (17 + 4 5 +
32 + 9)/4 = 19.5. This is our estimate. It is a lot smaller than the true
population mean of 33.1667.
This is because patient D is not in the sample. D is the oldest patient
and is 70. So his addition in the average would increase the mean and
his absence decreases it. So if we added D to the sample, the average
would be (17 + 4 5 + 7 0 + 3 2 + 9)/5 = 34.6. So adding D to the sample
increases the mean from 19.5 to 34.6. On the other hand, if we think
of the sample as being {B, C, D, E, F}, the removal of D drops the
mean from 34.6 to 19.5. So the infl uence of D is 15.1 years! This is
how much D infl uences the mean. This shows that the mean is a param-
eter that is heavily infl uenced by outliers. We will address this again
later.
The sample mean as an estimator is unbiased. That means that if
we averaged the estimate for the 15 possible samples of size 4, we